Speech and Hearing 



^eech and Hearing 


By 

Harvey fletcher, Ph.d. 

ylcuHstical Research Director 
XIell. J ei.ephone IjAboratorieSj Inc. 


WITH AN INTRODUCTION 

BY 


H. 1). ARNOLD, Ph.d. 


Director of Research 

BeLI. I KI.EPHONE LaBOII,ATORIES, Inc. 



NEW YORK 

D. VAN NOSTRAND COMPANY, Inc. 

EIGHT WARREN STREET 


1929 





Copyright, 1929, by 

D. VAN NOSTRAND COMPANY, Inc. 

All rights reserved^ including that of translation 
into the Scandinavian and other foreign languages^ 


First Published . . January^ ig2p 
Second Printings September, ig 2 Q 




Q( LIBRARY 


jr '■ 





<$ 11 . sg 


t\ / ? R 

/V ^ j 


PHINTftD IN U. IS. A, 


i»f>iiesei OF 

nif*lM.UHWOISTH ii CO., INC. 
SIOOK MANUFACTUnfCnS 
BimoOKL.YN, NKWYORK 



PREFACE 


Some fifteen years ago the Research Laboratories of the 
Bell Telephone System undertook a comprehensive survey of 
speech and hearing to obtain the fundamental facts on which 
to base the design of apparatus and systems for telephone use. 
The art of analyzing electrical currents into their component 
frequencies and accurately measuring them^ and the correlated 
art of describing and measuring the characteristics of mechanh 
cally . vibrating systems, had reached an advanced state of 
development. It was apparent that great advantages would 
come from similarly analyzing speech and hearing, for if we 
could accurately describe every part of the system from the 
voice through the telephone instruments to and including the 
ear, we could engineer the parts at our disposal with greater 
intelligence. 

A greater plan for a prolonged laboratory investigation was 
evolved and has been in operation since that time. The attack 
was first launched most vigorously on the constitution of 
speech In an effort to establish a reasonable description of 
average speech, and to find to what extent small Imperfections 
and variations in speech affected intelligibility. It is obvious 
that this work could hot go far without entailing a study of 
the organs of speech and the organs and mechanisms of hear- 
ing, and its scope came to be extended as well to some of the 
abnormalities in these faculties* 

As the work progressed it became apparent that better and 
lore precise instruments must be developed than were avail- 
ble, and a considerable part of the effort has been devoted to 
le matter of securing devices which would convert sound 
ives into electrical form and reconvert them again to sound 
ith the least possible distortion. Out of this have come 
lexpected rewards to the telephone and phonograph arts, for 



VI 


PREFACE 


as these devices were perfected they found very Immediai , 
application to the great advantage of those industries. 

One of the most difficult phases of the investigation ha- 
been that relating to the degree of precision with which th 
mind can differentiate and interpret sounds that are ver 
.nearly alike. This does not lend itself so readily to analyst 
and measurement as does the purely mechanical operatior 
the ear itself. The approach to this problem has been thn i 
the use of essentially perfect reproduction systems which c 
be deteriorated step by step until their faults became notu „ 
able to the observer. This set a limit to the degree of perfec 
tion which could ever be demanded in the apparatus. Wh*" 
•the deterioration was carried somewhat further an estim£ 
could be obtained of the degree of dissatisfaction presented 1 
certain measured imperfections^ and hence a practical basis 
choice of a reasonably perfect system could be established. 

As this large program was undertaken with definite idea | 
meeting and solving whatever difficulties might arise in 
progress, it is not surprising that it has brought us a wea' j 
of new and useful Information on the fundamentals of spec 
and hearing. Although of course the nature of this inforn. • i 
tion could not be predicted in advance, it is a foregone concl-. 
sion that no such persistent and thorough-going study can be 
carried through without large additions to the philosophy Oi 
the subject. The principal unforeseen and unexpected retur 
has been found in the application of the unusually perfe . 
devices developed in the course of the experimentation : 
practical commercial problems. Some such unlooked i j 
applications will naturally arise from any thorough-goin j 
research, but the magnitude of the results that were obtaine 
by adapting the devices of these experiments to the phon 
graph, to radio, and to the problems of the deafened cou j 
hardly have been visioned by the most ardent protagonist 
research as a speculative enterprise. | 

The work as originally planned is far from finished; 
as it has progressed new problems have arisen tha^ ' ^ i 

occupy us for years to come. Meanwhile, however, p.. 


PREFACE 


vii 

has already been the inspiration of a book by Dr. L B. 
:^randall on ‘'Vibrating Systems and Sound.'’ Now another 
'art is crystallized in the present volume, taking the general 
^')rm in which it was presented in Bell Laboratories' program of 
^out-of-hour courses." 

C The author desires to acknowledge the brilliant work of 
large number of the staff of Bell Telephone Laboratories 
have contributed to the work which he reports, to recog- 
' ^ ’^the helpful criticisms and suggestions offered by the mem- 
^*':'?:^of the staff of the American Telephone and Telegraph 
SJnpany who have been following it; to thank many of his 
^leagues who have greatly aided the preparation of the 
?terial for this book; and to express his sincere gratitude to 
' Arnold for his constant inspiration and sympathetic 
’derstanding of the many Intricate problems that have arisen 
VIng the progress of the research work. 

Harvey Fletcher. 

*r York, 

„ .December i, 1928. 




CONTENTS 


Introduction xi 

CHAP. page 

PART ONE—SPEECH 

I. Mechanism of Speaking 3 

IL Characteristics of Speech Waves 14 

III. Speech Power 64 

IV. Frequency of Occurrence of the Different Speech Sounds 81 


PART TWO—MUSIC AND NOISE 


I. Physical Properties of Musical Sounds 87 

II. Noise 99 


L 

IL 

III. 

IV. 
V, 

VL 


L 

IL 

III, 

IV. 


V. 


VL 


YIL 


PART THREE— HEARING 

Mechanism of Hearing. 

Limits of Audition 

Minimum Perceptible Differences in Sound.. 

Masking Effects. 

Binaural Beats 

Methods of Testing the Acuity of Hearing.. 


Ill 

132 

145 

167 

188 

189 


part four-the perception of speech and music 


The Loudness of Sounds 

The Recognition of the Pitch of Musical Tones 

Methods of Measuring the Recognition of Speech Sounds. 

Effect of Changes in the Received Intensity of Speech Sounds 

UPON Their Recognition 

Effect of pREatJENCY Distortion upon the Recognition of Speech 

Sounds... * ‘ 

Effect of Other Types of Distortion upon the Recognition of 

Speech Sounds 

Effect of Noise and Deafness upon the Recognition of Speech 

Sounds... * 


225 

2-45 

2-55 

270 

279 

290 

297 


5 ^ 



INTRODUCTION 


The atmosphere of sounds in which we live ministers so 
constantly to our knowled^ and enjoyment of our surround- 
ings that through long familiarity we have come to feel, if 
not contempt, at least indifference toward the marvelous 
mechanism , through which it works. Hearing, we are inclined 
to consider as little a matter for concern as breathing; and -so 
long as our own faculty remains unimpaired we feel little 
curiosity concerning the provisions of nature either for our- 
selves or for others. When we hear too faintly or indistinctly 
we know we need only trace the sound to its source to hear 
its perfect form, for that is the method we have used from 
childhood in investigating the sounds of our immediate neigh- 
borhood. 

Now with one broad sweep the barriers of time and space 
are gone and all the world becomes our vocal neighborhood. 
No longer can we transport ourselves to the origin of a sound 
and thus become convinced that we are hearing it aright, for 
that origin may be thousands of miles away or may have 
vanished years before; and so we must establish a new method 
to measure the accuracy of the copy which reaches our ears. 
We must also find a clearer index to our satisfaction In it, for 
we are no longer concerned with the immutable provisions of 
nature but may approach at corresponding expense whatever 
perfection we may demand in our instruments of translation 
and reproduction. Thus the telephone and the phonograph 
should excite a keener interest in how we hear and in what 
measures our satisfaction in the speech and music which, 
they provide. 

Our ears are only machines to translate air waves into a 
form suited to stimulate the auditory nerve; and as machines 


XI 



XU 


INTRODUCTION 


we may measure and describe them in the same terms that 
apply to devices we ourselves construct. We may compare 
them as to performance, and may accommodate our devices 
to their requirements. But, to understand the mechanism of 
the ear is by no means to understand the act of hearing, for we 
have not heard until the brain has perceived the message sent 
by the auditory nerve. We cannot explain in precise mechan- 
ical terms how this is done, nor indeed have we any very clear 
comprehension of the process at present. Some important 
factors relating to the process of hearing we can, however, 
determine by measuring the least changes in sound which can 
be detected under a variety of conditions of pitch, loudness, and 
accompanying noise. Thus we may obtain a quantitative 
means of comparing individuals in this respect, and establish 
a standard of average hearing. 

There is a most important factor in hearing, however, which 
is much more difficult of analysis and measurement. This is 
the individual’s ability to recognixe small defects in those 
sounds with which he has become especially familiar. We all 
know how quickly we note a slight change in a friend’s voice, 
and with what uncanny skill a trained musician will detect 
minute imperfections in very complex sounds. Our approach 
to a quantitative understanding of the importance of this must 
be by an indirect method. First we must construct devices 
so perfect that even the keenest ear cannot find a flaw in their 
rendition, and then step by step we may introduce measured 
imperfections until an observer can detect a fault. In the 
response of individuals to this test there will, of course, be great 
differences; but when we have collated opinions from a wide 
variety of observers we may forecast in a reasonable way the 
degree of mechanical perfection that may be demanded of 
our instruments. 

This, then, has been the philosophy of the investigation of 
hearing which has been carried on in Bell Telephone Labora- 
tories during the past fifteen years: to get an accurate physical 
description and a measure of the mechanical operation of 
human ears in such terms that we may relate them directly 



INTRODUCTION 


Xlll 


to our electrical and acoustical instruments; to test the keen- 
liess of the sound-discriminating sense and find what is the 
smallest distortion which the mind can perceive and how it 
reacts to somewhat larger distortions; and thus to reach a 
reasonable basis of design both for separate instruments and 
for systems^ as a whole^ to give a proper balance between cost 
and performance. 

With hearing, speech and music are linked inseparably for 
they only bring a meaning through our aural sense. It is an 
instinctive first thought that they must be heard to be criti- 
cized. They can, nevertheless, be investigated by mechan- 
ical means and be described in the same physical terms that 
we use in describing hearing; and thus to an extent we may 
consider them both objectively. But if we attempt to divide 
the study between speech and music we come at once upon 
the difficulty that speech conveys information by intonation 
as well as by articulate syllables; and this makes it infeasible 
to set a definite boundary between them. A division, how- 
ever, between vocal sounds and instrumental sounds proves 
more useful, for in the one case we are limited by our vocal 
organs which we must take as they are, while in the other we 
have a definite control and can adapt the nature and com- 
plexity of the sounds produced to conform to our sense of hear- 
ing and our musical appreciation. The investigation of speech 
and music has been governed by these general considerations. 
An attempt has been made to establish in definite terms the 
performance and limitation of the voice and, although so far 
in considerably less detail, to find the corresponding factors in 
instrumental music. 

With a clear knowledge of the nature of the sounds that 
we must produce and the accuracy with which we must main- 
tain their form, there remains the problem of securing instru- 
ments which are sufficiently refined for the purpose. Instru- 
ments of remarkable precision are required in the conduct of 
the investigation, since if we are to measure the smallest detect- 
able variations in sounds we must obviously use equipment 
which is capable of a degree of exactness beyond these small 


XIV 


INTRODUCTION 


quantities. Such instruments would appear at first sight not 
to have much utility outside the laboratory^ since they are 
costly and often complicated and difficult of adjustment. 

It is interesting to note, however, that some of the instru- 
ments, in essentially their original laboratory form, have found 
other important uses. Indeed, a surprising number of modern 
acoustical accomplishments have come about through the use 
of slightly modified forms of the apparatus which was origi- 
nally developed for these investigations. Modern phono- 
graphic records are produced with an electrical transmitter 
which was developed in the very early stages of these studies; 
and radio broadcasting has grown up around this same ''micro- 
phone.'' The reproducing equipment of the modern phono- 
graph and of the radio were predicated directly upon these 
investigations; and talking motion-pictures owe their success 
and much of their apparatus to this same source. 

Although the results which relate to normal speech and 
hearing are naturally the most familiar and widely known, 
there have also been important outgrowths in the way of aids 
to those handicapped In one or the other of these faculties. 
In establishing the functioning of the average ear it was obvi- 
ously necessary to investigate a large number of cases and 
among them some which departed rather widely from the 
average. For this study an instrument was devised, now 
known as the audiometer, which has put within the reach of 
all who need it the possibility of an accurate measure of their 
hearing. In quite analogous fashion there grew out of the 
investigation of the limits of hearing a better knowledge of 
ways to provide aids for those partially deaf; and it has even 
become possible to provide means of speech for some persons 
whose vocal chords are gone. 

Valuable as these results are, economically the most impor- 
tant outcome of the work has been the increase of exact knowl- 
edge as to the requirements and limitations to be placed upon 
the transmission of speech in the telephone system. As time 
goes on there must be an evolution toward even greater per- 
fection In those particular elements which are most important 


INTRODUCTION 


XV 


'V to intelligibility. The system is so large that the cost of such 
an evolution is immense and changes undertaken without 
an accurate knowledge of their value might lead to burden- 
some expenditures for disproportionate results; but^ with the 
facts established by this investigation in hand^ we can weigh 
any contemplated change and judge whether it is the one that 
offers most improvement at the moment and what its ultimate 
effect will be In its joint operation with other elements of 
the system. 

The work that Doctor Fletcher discusses drew at the start 
on all the acoustic knowledge available In the literature and 
during its progress every effort has been made to use to the 
best advantage the information found by other experimenters. 
For the most part^ however, he describes experiments per- 
formed and conclusions reached In Bell Telephone Laboratories 
during Investigations, captained in their early stage by Doctor 
Crandall and himself, for which since Doctor Crandall’s death 
he has had the full responsibility. No one can speak with 
better knowledge of the facts or with more complete authority 
for the opinions which he expresses. 

The work is not complete — indeed some parts of it are 
hardly more than started; yet its results have been so great, 
both for the original purpose which was planned and for the 
many issues which have since arisen, that it -presents a unique 
exemplification of the worth of systematic and sustained 
research; and Doctor Fletcher is to be congratulated that he 
has seen it through with such clear vision as permits its pre- 
sentation in its present form. 


H. D. Arnold. 



Part One 

Speech 




SPEECH AND HEARING 


CHAPTER I 

Mechanism of Speaking 

Origin and Evolution of Language 

When beginning a stucjy of language some of the first 
questions which naturally arise are: ‘'How is it that people 
everywhere do not speak the same language “How were 
words first created?'’, “Why is such-and-such a person and 
such-and-such a thing called this and not that?", and similar 
questions. According to some anthropologists, 150 separate 
languages which seemed to have no common origin were 
spoken among the American Indians. For this reason the 
Indians of one tribe communicated with those of another by 
means of signs and gestures. Thus, the so-called Indian 
sign language grew up. It is very probable that such signs, 
gestures, and expressions of the face were used before the 
evolution of the spoken language had progressed very far. 
According to some philologists, the vocal sounds of very 
^primitive people were exclamatory and song-like and used 
mainly to express emotion. Sounds mimicking nature came 
to designate certain things connected with the thing imitated. 
As man's power of analysis developed, the sounds gradually 
developed into spoken words having definite meanings. 

According to Sir Richard Paget, ^ human speech began 
by the performance of sequences of simple pantomimic gestures 
of the tongue, }ips, etc., comparable with the natural gestures 
(of hands, etc.) which are still made by deaf mutes, and that 
these gestures were made audible by breathing or grunting. 

^ ^'The Origin of Speech, Sir Richard Paget, "Proc. Royal Soc.j May, 1928, p. 157. 


4 


SPEECH AND HEARING 


For example, consider the word ‘‘hither/' The tongue makes 
the same beckoning gesture while speaking this word as is 
made with the hand. 

Although there are a great many different languages spoken 
in different parts of the earth and each language has a system 
of speech sounds of its own, there is a great similarity among 
these fundamental speech sounds. This is necessarily true 
since there is only a limited range of distinct sounds that can 
be made by the organs of speech. Although the mechanism 
of producing particular speech sounds in the various languages 
is somewhat different, the general mechanism of producing 
speech is similar for all people. 

Description of the Organs of Speech 

The organs of speech are the lungs which by their bellows- 
like action supply the streams of air which pass in and out the 



vocal passages, the vocal cords, the tongue, the lips, and 
the cavities of the nose and throat. These impress on the 
air stream variations which are heard as speech sounds. 


DESCRIPTION OF THE FORMATION 


5 


In Fig. I is shown the cross-section ot the human head in 
the position to speak into a telephone transmitter. This shows 
the relative positions of the various organs of speech. The 
vocal cords are a pair of muscular ledges on both sides of the 
larynx forming a straight slit through which the breath 
passes. The vibration of these vocal cords starts a train of 
sound waves which pass through the' vocal passages which 
impress on it certain resonant characteristics so that the 
vibrations finally emerge from the mouth as speech sounds. 
The pure vowels, the diphthongs, the transitionals, and the 
semivowels are produced in this manner. Other sounds called 
the unvoiced consonants are produced without using the 
vocal cords at all. They are produced by passing the air 
through small openings or over sharp edges in the mouth. 
There is a third class of speech sounds called the voiced 
consonants which are produced by a cornbination of the two 
processes just mentioned. , 

Description of the Formation of the English Speech Sounds 

Different classifications of the spoken sounds of English 
may be made, depending upon the purpose one has in mind. 
The International Phonetic Association uses a basic alphabet 
of 65 different letters and also uses numerous modifiers which 
serve to distinguish several hundred different sounds. Such 
a system, of course, is altogether too complex for use in engi- 
neering work. The revised scientific alphabet, sometimes 
called the N. E. A. alphabet, uses 48 sounds. It is difficult 
for the average person to differentiate between some of these 
sounds. After considering the manner of formation of the 
speech sounds and studying their physical characteristics and 
the interpretation given by the average person, 39 speech 
sounds which can be readily distinguished by an average 
English-speaking person were chosen. These are the same 
sounds as were selected by the Simplified Spelling Board 
except that the sounds in the word ton and the word nntion 
were considered near enough alike to be designated by a single 
letter ‘'o’' and the sounds in the words p^^rt, noty and ffzther 



6 


SPEECH AND HEARING 


were sufficiently similar to be designated by a single letter 
“a.” These fundamental speech sounds are divided into six 
classes, namely, pure vowels, diphthongs, transitionals, semi- 

TABLE I 


Classification of the Speech Sounds 



K 



I. Pure Vowels — ii 



Long — u (tool)j 0 

(tone), 6 (talk), 

a (far), a (tape), c (team) 

Short — u (took), o 

(ton), d (tap), e 

(ten), i (tip) 

2 . Diphthongs — 4 



I, ou, oi, ew 



3. Transitionals — 3 



w, y, h 



4. Semi-vowels — 5 



1, r, m, n, ng 



5. Fricative Consonants — 8 


Voiced 

Unvoiced 

Formation of Air Outlet 

V 

f 

lip to teeth 

z 

s 

teeth to teeth 

th (then) 

th (thin) 

tongue to teeth 

zh (azure) 

sh 

tongue to hard palate 

6. Stop Consonants — 8 



Voiced 

Unvoiced 

Formation of the Stop 

b 

P 

lip against lip 

d 

t 

tongue against teeth 

j 

ch 

tongue against hard palate 

g 

k 

tongue against soft palate 



DESCRIPTION OF THE FORMATION 


7 


vowels, fricative consonants, and stop consonants. The two 
classes of consonants are further divided into the groups 
designated voiced and unvoiced. The complete list is tabu- 
lated in Table I. The diacritical marks usually used are 
entirely too complicated for use in engineering work, so it will 
be noticed that only vertical and horizontal lines above the 
letters are used to indicate how the sounds should be pro- 
nounced. Such a simplification certainly makes it very 
much easier to write these sounds. At the top of Table I the 
pure vowels, the diphthongs and the transitionals are shown in 
a diagram which helps to illustrate the manner in which the 
sounds are formed. Starting with the sound u, the lips are 
rounded and there is formed a large resonating cavity in the 
front part of the mouth and a smaller and less important one 
in the throat cavity. Passing down the left side of the triangle 
in the diagram from u to a, the mouth is gradually opened 
with the tongue lowered to form the successive vowels. In 
all these vowels, the throat resonance is playing only a minor 
part. Going up the right side of the triangle from a to e, the 
tongue is gradually raised to the front part of the mouth, thus 
forming two resonance chambers, both of which produce 
marked effects upon the sounds from the vocal cords. 

In Fig. 2 are shown the tongue and the lip positions for 
forming these vowel sounds.^ An infinite number of dif- 
ferent shadings of these vowels may be produced by placing 
the mouth in the various intermediate positions, but the ones 
shown were chosen as being the most distinct. It is very 
difficult to define differences between vowels and consonants 
which are satisfactory in all cases. In general, however, pure 
vowels are characterized by continuous wave trains formed in 
the throat and passing through opened passages. Sounds at 
the lower end of the consonant list, on the other hand, repre- 
sented by the unvoiced stop consonants, are characterized by 
a short group of waves formed mostly by the mouth and 
released suddenly by opening the lips. Between these two 
extremes is the graduated series of diphthongs, transitionals, 
^For a farther discussion see ''Sounds of Spoken English/* by Walter Ripman. 


8 


SPEECH AND HEARING 


semi- vowels, voiced fricatives, unvoiced fricatives, and voiced 
stop consonants. 

The diphthongs i, ou, oi, and ew are really combinations 
of two of the pure vowels. If the mouth is placed in the posi- 
tion to say a and then changed over without interrupting 


Tongue Positions Tongue Positions 



Figure 2. 


the sound to the position to say e, the result is signified in 
writing by the letter I. Similarly, as illustrated in Table I, 
the diphthong ou is a combination of a and u, the diphthong 
oi the combination of 6 and e, and the diphthong ew the 
combination of e and u. 

The three sounds, w, y, and h, are called transitionals since 
they represent a particular way of beginning the vowel sounds. 
If the mouth is placed in the position to say u and then 
suddenly changed so as to form any other vowel in the dia- 
gram, the result obtained is signified in writing by placing 
w before the vowel. In a similar way we obtain the effect 
designated by y if the position of the vowel suddenly changes 
from e to any other vowel. The diphthong ew is different 











DESCRIPTION OF THE FORMATION 


9 


from the sound represented by yu only in that when'^ithe :y’ 
is used the transition from the e to the u sound is much more 
rapid than when the diphthong is formed. An infinite variety 
of diphthongs and transitionals can be formed by varying 
both the rate of change and the size of the vocal cavities 
necessary to form the two vowels. The most distinct and 
principal ones used in our language are those shown in Table I. 
When a vowel begins a syllable, it is formed by suddenly 
opening the glottis and thus permitting the air which has 
been held in the lungs to escape into the mouth formed for 
the proper vowel. If the glottis is originally open the vowel 
is started by the sudden contraction of the lungs. Under 
these conditions the effect would be represented in writing by 
placing h before the vowel. 

The sounds 1, r, m, n, and ng are classified as semi-vowels 
since for these sounds the passage from the vocal cords to 
the outside air is partially blocked. In the case of 1 and r 
the sound is allowed to flow around the tongue which is placed 
in a particular positon in the mouth. For the sounds m, 
n, and ng the usual path through the mouth is interrupted 
so that the sound and the air accompanying it flow through 
the nasal cavities. For this reason they are sometimes called 
nasalized stop consonants. 

The fricative consonants are characterized by the rushing 
sound of the breath through the characteristic air outlet 
which is usually of very small dimensions. The manner in 
which these sounds are formed is evident from Table I. For 
producing the sound f the outlet is produced by holding the 
lower lip to the upper teeth. If while the f sound is being thus 
produced, a tone from the vocal cord is also sounded, the 
speech sound v is formed. Similarly, other unvoiced and 
voiced fricative sounds are produced by changing the nature 
of the air outlet as shown in the table. The stop and fricative 
consonants are classified in a similar way, being both the 
voiced and unvoiced consonants. For example, b and p 
are both characterized by a stop formed with the lip against 
the lip, d and t with the tongue against the teeth, j and ch 



lo 


SPEECH AND HEARING 


with the tongue against the hard palate, and g and k with 
the tongue against the soft palate. 

The voiced sounds may be divided into two classes; those 
produced by a continuous flow of air which may be called the 
continuants and those produced by stopping the sound flow 
in certain ways which may be called the stops. The former 
group, including vowels and voiced consonants, is the one 
used to carry the pitch in singing. It is seen from Fig. i that 
such sounds pass through two variable resonating cavities, 
namely, the mouth and the throat. For this reason, all voiced 
sounds are characterized by having component frequencies 
magnified in two particular regions. It is largely the charac- 
teristics of these regions of resonance that distinguish one of 
these sounds from another, especially when they are sung. In 
speaking, however, the way the sounds are started and ended 
has considerable to do with the ability to recognize them. In 
any of the voiced sounds it is important to notice that it is the 
modulation of the cord tone that gives the distinctive sound 
rather than the characteristic of the vocal cords. The latter 
determines the type of voice and identifies the person who is 
speaking but has little to do with the characteristics of the 
speech sounds which determine their recognition. 

Artificial Production of Speech Sounds 

Once having a clear picture of the mechanism of speaking, 
one can easily see how an electrical apparatus can be made 
which will produce some of these vowel sounds. A convenient 
form of such apparatus is one in which the generator of the 
electrical vibrations is an overloaded vacuum tube oscillator 
which corresponds to the vocal cords. The complex wave 
generated consists of a fundamental and a large series of 
harmonics. These electrical vibrations are conducted through 
two electrically resonant circuits and then to a loud 
speaker. A schematic of the circuit for producing artificially 
the simple speech sounds is shown in Fig. 3. An arrangement 



ARTIFICIAL PRODUCTION OF SPEECH SOUNDS n 


similar to this was first used by Stewart.^ If the resistance^ 
capacity, and inductance of the two resonant circuits are 
adjusted to have resonant frequencies and dampings corre- 
sponding to those existing in the mouth and throat when 
one of the vowel sounds Is being produced, then sound waves 



Fig, 3. — Schematic of Circuit for Producing Artificially the Simplest Speech 

Sounds. 


issuing from the receiver will have characteristics similar to 
this vowel sound. In producing the diphthongs (and in 
English speech most of the vowels are spoken as diphthongs) 
the resonant properties of these circuits must be varied from 
one condition to another in a definite manner.^ 

Sir Richard Paget has recently developed some acoustic 
apparatus for artificially producing speech. A device for 
making a sound similar to that produced by the vocal cords 
Is attached to a large bellows which is operated by the foot. 
This is attached to resonating air chambers having the proper 
resonant frequencies and damping characteristics for repre- 
senting the various vowel sounds. A good representation of 

^ Stewart, J. Q., “An Electrical Analogue of the Vocal Organs,” Nature^ Vol. iio, 
September 2, 1922, pp. 31 1-3 12. 

2 An apparatus similar to that described was used by the author before the New 
York Electrical Society, February, 1924, to demonstrate the production of the sounds, 
a, e, i, 0, u, ya, mamma and papa. 



12 


SPEECH AND HEARING 


some of the stop consonants is produced by properly inter- 
rupting the sound at an artificial mouth which is provided. 

Artificial Larynx 

Some recent experiences with persons who have lost their 
larynx through an operation has emphasized the fact that the 



differentiation of the speech sounds is practically all accom- 
plished by the mouth and lip positions and that the sounds 
from the vocal cords act only as a sort of carrier. 

The surgical operation known as a tracheotomy leaves no 
connection between the lungs and the mouth. It is performed 
usually in an emergency to prevent the patient from dying of 



ARTIFICIAL LARYNX 


13 


suffocation. When a patient recovers from such an operation^ 
the process of breathing is carried on by drawing the air in 
and out through a small opening in the neck. Because of this 
by-passing of the larynx, the patient can make no vocal 
sounds. However, if an attachment is made to the opening 
of the wind pipe so that the patient can blow a whistle, some- 
thing similar to that used in toy balloons, and the sound 
directed into the corner of the mouth, the patient can learn 
to talk again. 

Through the cooperation of Dr. J. E. Mackenty of New 
York City and engineers of Bell Telephone Laboratories, a 
device called an artificial larynx was developed and is now 
being used. Figure 4 shows a photograph of this device. 
More than one hundred persons in the United States who have 
undergone this operation are now using it successfully. 



CHAPTER II 


Characteristics of Speech Waves 

Speech sounds radiating from the mouth are transmitted 
through the air by means of pressure waves^ successions of 
condensations and rarefactions of the air. The magnitudes 
of the pressure changes making up these vibrations are exceed- ‘ 
ingly small and the wave form Is complicated as the cavities 
of the mouth and the throat are continually varying in size 
and the stream of air Is being constantly interrupted. The 
physical characteristics of these waves which carry typical 
speech sounds will be discussed in this chapter. 

Methods of Recording Speech Waves 

Inasmuch as the time of passage of such waves Is very 
short, it is desirable for scientific study to have permanent 
records of the movements of the air particles as the wave 
traverses them. The examination of such records will reveal 
the physical characteristics of the sound wave. 

Helmholtz and some of the other earlier investigators in this 
field analyzed the intoned vowels by listening to the sound 
after it passed through acoustical resonators. Since his time, 
these well-known devices have been called Helmholtz 
resonators. However, due to uncertainties in an observer's 
judgment only a very rough analysis can be made by using 
them. By these means, however, Helmholtz and some of his 
contemporaries found that the vowels on the left side of the 
vowel triangle in Table I were characterized by a single 
resonant reinforcement, and those on the right side by a double 
resonant reinforcement and for each they gave values for the 
characteristic resonant frequencies. 

14 



METHODS OF RECORDING SPEECH WAVES . 15 


The sound wave corresponding to a word or syllable, 
assuming that it is free to travel without reflection or refrac- 
tion, can be specified in either of two ways: (i) by giving for 
every air particle over the entire length of the disturbance 
the displacement at any instant from its position of equilib- 
rium along a line perpendicular to the wave front or (2) by 
giving at every instant of time while the wave disturbance is 
passing the displacement of a single particle. Recording a 
syllable having a duration of one-fifth of a second requires a 
free air space about 200 feet long. Although it is not impos- 
sible to think of some scheme of photographing the rarefactions 
and condensations along such a space, no practical method 
has yet been devised for doing it. The nearest approach to 
it is probably that obtained by Sabine ^ in photographing the 
condensations and rarefactions produced in a large theater 
by the sound of an electric spark. 

The second method has been used in several ways for 
making permanent records of speech sounds. Usually a light 
diaphragm is used to indicate the movements of the air par- 
ticles. If such a diaphragm is very light and unconstrained 
except by the layers of air on either side of it, it will execute 
practically the same motions during the passage of the sound 
wave as the air particles would execute if the diaphragm were 
absent.2 

Probably the nearest approach to such a diaphragm is a 
soap film. An instrument provided with such a film and 
called a Weiss ^ phonoscope, has been used with some success 
although considerable difficulties are encountered in keeping 
it in adjustment. A beam of light reflected from the film 
surface indicates its movement. Most such indicating devices 
use diaphragms which are comparatively heavy and are con- 

^ ^‘American Architect/* 104, pp. 2.57-279, 

2 The diaphragm must be lighter than a quantity of air having an equal cross- 
section which is vibrating approximately in phase, that is, which is about one- tenth 
of the wave length in width. Such a quantity of air weighs about one milligram 
per square centimeter of cross-section for a 5000-cycle sound wave. 

^ Mediz, naturw, Arch.y Band I, Heft 2, December 15, 1907. Otto Weiss, Das 
Phonoskop^ cine Vorrlchtung zur analyse und registrierung schwacher SchallquaUtaten, 



i6 


SPEECH AND HEARING 


strained by being clamped at the edge. These diaphragms 
usually have several pronounced resonant frequencies but 
serve to give a rough indication of the type of motion of the 
air particles as the sound wave passes over them. 

Koenig s Phonautograph and Manometric Capsule 

One of the first instruments using a diaphragm of this 
description^ called the ‘'phonautograph/' was designed by 
Koenig and Scott.^ A photograph of this instrument^ is shown 
in Fig. 5. The membrane is clamped at the end of a horn. 
A stylus attached to this membrane makes a trace on the 
smoked paper carried on the revolving drum. 

Koenig also devised the manometric capsule for producing 
the. manometric flame which is still used as a demonstration 
instrument in many physical laboratories. This apparatus^ 
is shown in Fig. 6. It may be seen from the cross-section {A) 



Figure 5. 


that the capsule is divided into two compartments separated 
by a membrane usually made of thin rubber. The gas sup- 
plied for the flame enters one compartment and the sound 

1 Koenig and Scott, ** Cosmos," 14, p. 314, 1859. 

2Rousselot, P,, “Principes de Phonetique Experimentale,” p. no. 

3Rousselot, P,, ‘‘Principes de Phonetique Experimentale," p. in. 


THE PHONODEIK 


17 


waves are sent into a tube connected with the other. The 
variations produced in the gas pressure are indicated by a 
rise and fall of the gas flame. By reflecting the gas light 
from a revolving mirror as shown in the figure, these variations 



Figure 6. 


are made visible or may be photographed. Professors Nichols 
and Merritt ^ perfected this method and obtained records of 
vowels and spoken words. Figures 7 and 8 show some of 
their records.^ There have been variations of these two 
methods since Koenig's time but no experimenters have been 
successful in obtaining a diaphragm which would execute 
vibrations even approximately proportional to the pressures 
produced in the speech wave. For this reason^ any speech 
wave pictures obtained with such apparatus give only a rough 
indication of what is taking place as the wave passes through 
the air. 

The Phonodeik 

The most highly developed successor of the phonautograph 
is an instrument devised by D. C. Miller and called by him 
the '‘phonodeik." In this instrument the stylus of the 

^ Physical Review^ Vol. 7, p. 93, 1 89S. 

'^Physical Review, Vol. 7, p. 92, 1898. 



I8 


SPEECH AND HEARING 


phonautograph is replaced by a light-weight mirror system -f? 
which reflects rays to a moving film. The following descrip- 



Fig. 7. — ^Record of “R” (Rolling). Obtained by Professors Nichols and 

Merritt. 



7KiMU//UUiU/iA//U/iU/^ fl 





Fio. 9 . — Principle of the Phonodeik. 


THE PHONOGRAPH 


29 


tion of this instrument is taken from his book, “The Science of 
Musical Sounds.’’ 

“The sensitive receiver of the phonodeik is a diaphragm, Fig. 9, 
of thin glass placed at the end of a resonator horn h\ behind the 
diaphragm is a minute steel spindle mounted in jeweled bearings, 
to which is attached a tiny mirror m\ one part of the spindle is 
fashioned into a small pulley; a few silk fibers, or a platinum wire 
0.0005 inch in diameter is attached to the center of the diaphragm 
and being wrapped once around the pulley is fastened to a spring 
tension piece; light from a pinhole / is focused by a lens and reflected 
by the mirror to a moving film /in a special camera. If the diaphragm 
moves under the action of a sound wave, the mirror is rotated by 
an amount proportional to the motion, and the spot of light traces 
the record of the sound wave on the film.’* 

Figure 10 shows a photograph ^ of the phonodeik ready 
for use. Professor Miller did a large amount of work in an 
endeavor to correct the speech records obtained by the phono- 
deik for the. distortions introduced by the resonance charac- 
teristics of the horn and diaphragm. These corrections were 
for the amplitude distortion only, no attempt being made to 
correct for phase distortion since he was interested mainly 
in the relative amplitudes of the component frequencies and 
not in the relative phases. 

Numerous oscillograms taken by the phonodeik and then 
analyzed by a harmonic analyzer verified the conclusion of 
Helmholtz that the vowels on the left side of the triangle in 
Table I were singly resonant while those on the right side were 
doubly resonant. Although other apparatus is now available 
which gives a very much more accurate copy of speech waves, 
the phonodeik is still a very satisfactory instrument for 
demonstration purposes. 

The Phonograph 

Another method of studying the forms of sound waves 
•uses the phonograph, invented in its original form by Edison 
in 1877, which makes a permanent record of speech waves. 

1 Taken from D. C. Miller's book, “Science of Musical Sounds/' 



SPEECH AND HEARING 


20 

The tra;ce on the phonograph record can be made visible in a 
number of ways. Herman i in 1890 and Bevier^ in 1900 used 
a delicate tracing point carrying a mirror mounted so that as 



Fig. 10. — The Phonodeik. 


the point traversed the undulations in the record, a beam 
of light, reflected from the mirror, fell upon a moving photo- 
graphic plate. The record can be turned so slowly that the 

‘Herman UPfluseri Archh., 45, 282 (1889); 4% 42, 44, 347 (1890); and others. 
* Uevierj L., Thy steal Review, lo, 193 (1900). 




THE PHONOGRAPH 


21 


mass and elasticity of the lever system do not cause any 
appreciable distortion. Scripture ^ did a large amount of 
work using a similar method. In place of the mirror system 
he substituted a long lever which carried a stylus at its end. 
As it vibrated it traced an undulating curve on smoked paper 
mounted on a revolving drum. The arrangement of Scrip- 
ture’s apparatus ^ is shown in Fig. ii. Using this apparatus, 


ROTATOR 

GRAMOPHONE DISC 



pul^^or'” 
ROTATING TUBE 



Figure ii. 



Scripture obtained tracings^ shown in Fig, 12 which give 
the wave form for the vowels spoken by Joseph Jefferson and 
recorded on a disc record using facilities available at that 
time (1903). 

Just recently there has been considerable improvement in 
the technic of recording phonograph records. In the old 
method the power of the sound being recorded was used to 
operate the recording instrument. The old phonautograph 
with a sharp needle for a stylus illustrates the principle of the 
phonograph method of recording used until just recently. 
In order to obtain good records by this method it is necessary 
to obtain large intensities close to the horn of the recorder. 
Figure 13 shows some of the difficulties encountered when 
using this method. This and the next figure were taken from 

^ Scripture, E. W., “Researches in Experimental Phonetics," Carnegie Institution 
of Washington Publication No. 44. 

2 Scripture, E. W., “Researches in Experimental Phonetics," Carnegie Institution 
of Washington Publication No. 44, p. 24. 

® Scripture, E. W., “Researches in Experimental Phonetics," Carnegie Institution 
of Washington Publication No. 44, p. 51. 



aa 


SPEECH AND HEARING 



_ v^AaaaaaAAA/ V\AAAAAAAA/'1 




^‘\lW\fW\/W\f^\/W\/W\/^^^ 


— -^.aaaaM/'iAA/WIAA/IB 
- — ^aaaaaAAAAAMAAMAAM* 


Br li 

^ AT ‘A/^^AA^A/wVvx/^/n^ ^/'iA/v* 




Fig. la. — ^W aves from Vowels by Josp:ph Jefferson, 


THE PHONOGRAPH 


23 


a paper ^ by J. P. Maxfield and H. C. Harrison, two engineers 
of Bell Telephone Laboratories who were principally respon- 
sible for the development of the new electrical method of 
recording. In the new method high-quality telephone appa- 
ratus with vacuum tube amplifiers is used, which gives more 
freedom to the artists and much better control of the cutting 



Figure 13. 


needle. The amount of power available to operate the cutting 
needle is not dependent upon the original acoustic energy in 
the sound wave, but may be made any convenient value by 
changing the amplification in the vacuum tube amplifiers. 
Figure 14 shows the same orchestra as shown in Fig. 13 but 
recording by the new electrical process. The greater flexi- 
bility of the new system of recording makes possible very 
much better records of speech and such records naturally 
yield more reliable results when analyzed bv the method above 
mentioned. 

The phonograph method has a distinct advantage that the 
speech may be reproduced and compared with the original. 

^ Methods of High Quality Recording and Reproducing of Music and Speech 
Based on Telephone Research, presented at A. h Convention at New York City, 
February 8~ii, 1^261 



24 


SPEECH AND HEARING 


If it is a sufEcieritly good reproduction it may be safely 
assumed that the wave on the record faithfully represents the 
original sound. The analysis of the speech wave into com- 
ponent frequencies can be greatly facilitated when it is on a 
phonograph record for all the component frequencies can be 
located by changing the speed of the record during the repro- 
duction and noting the response of a single resonator, either 
acoustic or electric. 

Another method of recording speech which is now assuming 
considerable commercial importance, makes use of a motion- 
picture film. The record consists of variations in the density 
of the film corresponding to pressure variations in the sound 
wave. There are two principal methods used. In the first 



Figure 14. 


a slit of fixed width is placed near the moving film and illu- 
minated by an electric lamp whose intensity is controlled by 
variations in the pressure of the sound wave. For this purpose 
the apparatus for electrical recording which has been described 
can be used, but in place of the cutting tool a lamp is sub- 


THE PHONOGRAPH 


stituted. A circuit is arranged so that variations in the volt- 
age irnpressed upon the lamp cause similar variations in the 
intensity of the light emitted. Various types of lamps have 
been proposed, the requirement for obtaining faithful record- 



Fig, 15. — Sound Film Records, Above, Light Valve Records, with 1000 Cycle 
Tone; Below, Profile Records. 


ing being that the intensity ofthe light emitted be proportional 
to the voltage impressed upon its terminals. It is difficult to 
obtain an ideal lamp of this sort although some fairly good 
records have been produced in this manner. 

In the second method the intensity of the light is held 
constant while the width of the slit is made to vary in accord- 
ance with changes in the amplitude of the sound wave. A 
device for doing this is called a light valve. It consists 
essentially of a shutter or shutters which are controlled 
electromagnetically. If this device is properly constructed, 
it can be made to give very accurate records. The first two 
charts of Fig. 15 give sample records taken by the latter 
method- The first is for the word “farmers”; the second is 


26 


SPEECH AND HEARING 


for a pure tone having a frequency of looo cycles per second. 
It is interesting to note that such a record gives a true picture 
of the variations in the density of the air at any instant along 
the train of waves passing through it. The original sounds 
can be reproduced from such records by means of suitable 
apparatus. For this purpose a narrow beam of light is trans- 
mitted through the moving film and permitted to fall upon a 
photo-electric cell. The variations in the intensity of the 
light which are caused by variations in the density of the film 
cause similar variations in the photo-electric currents. These 
currents are then magnified by means of vacuum tube ampli- 
fiers and finally reproduced as sound waves by a loud speaker. 
Although such records can now be made very accurately and 
are useful for purposes of analysis, they do not give a picture 
of the wave form that is as readily interpreted by the eye as 
an oscillogram. 

The High-quality Oscillograph 

On account of the importance to the telephone industry 
of obtaining accurate pictures of speech waves, considerable 
research work has been done in Bell Telephone Laboratories 
to perfect the method of recording speech sounds by means 
of an oscillograph. This work was directed by the late 
Dr. Crandall, most of the details of the designing of the 
apparatus and the making of the records being done by 
C. F. Sacia. 

Briefly stated, the principles involved in this method are 
as follows: Speech waves are picked up by a telephone trans- 
mitter and converted into electrical waves. They are then 
magnified by means of a special amplifier and sent into an 
oscillograph, where they cause a tiny ribbon to vibrate. The 
motion of this ribbon is then photographed on a moving film. 
The perfection of this instrument for recording speech sounds, 
as well as many other instruments used in acou,stic measure- 
ments, some of which will be described later, depends upon the 
development of three important devices, namely: (i) the 
condenser transmitter and the means of calibrating it, (a) the 



THE HIGH-QUALITY OSCILLOGRAPH 


27 


vacuum tube with the circuit arrangements for producing 
amplification and electrical oscillations, and (3) the oscillo- 
graph. By means of these devices, an instrument for record- 
ing speech sounds was finally obtained which was more nearly 
free from distortion than any which had yet been used. For 
this reason a somewhat detailed description of it will be given.^ 

The element for converting the sound waves into electrical 
waves is the condenser transmitter which has been thoroughly 
investigated by E. C. Wente. It has an approximate uniform- 
response characteristic from o to 8000 cycles per second, giving 
about 3 X volts at its terminals per bar (dyne per square 
centimeter) pressure on its diaphragm. The vacuum tube 
amplifier consisted of 7 stages, the last stage containing 8 tubes 
in parallel, making 14 tubes in all. This provides a voltage 
amplification of about 40,000 and the 8 tubes in the last stage 
make it possible to work into the low impedance oscillograph 
vibrator without using a coil transformer. The coupling 
between the tubes was entirely free from coils. This was 
necessary in order to preserve a uniform characteristic for 
various frequencies. The vibrator of the oscillograph was 
specially constructed, having small mass, high tension, and a 
moderate amount of damping. This was necessary to obtain 
a uniform response. These three elements of the recording 
system were connected together so as to produce as little 
distortion as possible. An ideal system would reproduce all 
frequencies with the same efficiency and produce phase lags 
which are proportional to the frequency. The energy output 
should also be proportional to the energy input. Such a sys- 
tem would record sounds without any distortion. In Fig. 16 
are shown the amplitude and phase characteristics of the 
system as it was finally arranged. It will be seen that the 
first two requirements mentioned are well fulfilled. The 
measurements indicated that throughout the range of inten- 
sities used, the last condition was also well fulfilled. 

The apparatus was sufficiently powerful to record sounds 

'For a more complete description see “Sounds of Speech,” by I. B. Crandall, 
published in The Bett System Technical Journal, October, 1925. 



28 


SPEECH AND HEARING 


spoken in an ordinary tone of voice when the speaker’s lips 
were about 3 inches from the transmitter. A key was pressed 
by the speaker just before the sound was spoken, which 
released a shutter placed before a rotating film drum on 
which the record from the oscillograph vibrator was traced. 
With the size of the drum and the speed at which it was 
rotated each one-hundredth of a second corresponded to 
2 inches or more on the time scale. The apparatus was 
arranged so that the helical trace made on the record was 



60 


a 

400 


i 


i 

0 a. 


-20 


Fig. 16. — Overall Freqiiency Characteristics of Amplitude and PfrASE of the 
Rfxordino System. Curve A: Oscillographic Amplitude per Unit of Pres« 
SURE ON Transmitter Diaphragm. Curve B: Phase I.ag of OsciLLocHtAPHic 
Amplitude Behind Pressure on Diaphragm. 


200 inches in length for one second of time. A.s i.s indicafdtl 
from the records obtained, special care was taken witli the 
•optical system to insure fine definition, and in the development 
of the films to obtain the proper contrast. 

Typical Speech Waves 

The wave forms of the words “farmers,” “seems,” “poor,” 
and “alters” taken by this method are shown in F'’ig.s. 17, 18, 



TYPICAL SPEECH WAVES 


i 29 


r 19, .and 20. These serve to illustrate the complicated structure 
of speech waves, and the effects of starting and stopping the 
sound. It will be seen in Fig. 17, which gives the oscillogram 
of the word “farmers,” that the first letter sound “f” is 


F 



R 






500 CYCLE5 



Fig. 17. — Wave Form of the Word “farmers.” 


characterized by very high frequencies. After these high 
frequencies the “a” sound is produced by only five complete 
waves having fundamental, frequencies corresponding to 
approximately 120 cycles per second. The “a” sound is fol- 


1- 




IISC Lib 

621,38 N28 


B’lore 




3 ° 


SPEECH AND HEARING 


lowed by about twenty complete waves of the “r” sound hav- 
ing this same fundamental frequency, followed by. about nine 
complete waves of the “m ” sound also with the same frequency. 
As the “er” sound was reached the pitch of the voice was 
slightly raised to a pitch corresponding to a fundamental 
frequency of about 130 cycles per second. This was followed 

S 



EE 




M 





500 CYCLE 



Fig. 18. — Wave Form of the Word “seems.” 


by the "s” sound, again characterized by very high fre- 
quencies. 

Charts for all the speech sounds both for male and female 
voices were obtained by means of this apparatus. In Figs. ai 
to 32 complete charts are given for twelve of these records, the 
ones selected by Crandall as being typical. In Figs. 33, 34, 
and 35 are shown the records taken by the same voice for 



TYPICAL SPEECH WAVES 31 

all the long vowels,, the short vowels, and the semi-vowels. 
Only the typical part of the wave is given for each case. These 
pictures show the forms of the waves as they emerge from the 
mouth. In a room with reflecting walls, the sound which 
finally reaches the ear of a person three or four feet away 
from the speaker is a combination of the original wave and 
several reflected waves. The amplitudes and phases of the 
components in the reflected waves which reach the ear are very 

P 00 


|^■A/WWW'> 


500 CYCLES 


Fio. 19 . — Wave Form op the Word “poor.” 

different from those of the original and also from each other, 
so that when they combine at the ear they form a wave with a 
shape entirely different from that of the original wave emerging 
from the mouth. If the phases only of the components are 
changed and the relative amplitudes remain the same, the 
ear usually recognizes no change; in other words, the ear does 
not ordinarily recognize phase differences. To illustrate this 
change in wave form with phase shift, two graphs representing 






3 ^ 


SPEECH AND HEARING 


the vowel sound “ah” are shown in Fig. 36. The amplitudes ^ 
of the component frequencies were experimentally determined 
and the graphs were then calculated. The first wave form 
represents the wave picture when the component frequencies 
have no phase displacements. The component frequencies 
of the second wave form have the same amplitudes as the 


A 



L 

-..■ y i 



f 


ER 


s 




500 CYCLES 


Fig, 20. — Wave Form of the Word **aeters.’* 




first ones but the phase displacements are proportional to the 
square root of the frequency. This kind of phase distortion 
is produced by a non-loaded cable telephone line. As stated, 
if the phase displacement is proportional to the frequency of 
the component, the wave picture does not change. The two 
wave forms are quite different although the acoustic spectra 








TYPICAL SPEECH WAVES 


33 



Fig. 21. — “a” as in father." Spoken by M.A. — Male, Low-Pitched. 



1.035«1 !045fiC . l.055ec 



AS IK “put,” Spoken by M.A,-“Male, Low-Pitched. 







TYPICAL SPEECH WAVES 


35 



Fig. 23. — "o” AS in “ton.” Spoken by F.D. — Female, High-Pitched, 







Fig. 24, — “i” AS IN “tip.” Spoken by Low-Pitched, 











38 


SPEECH AND HEARING 



Fig. 26 . — Spoken by M,B. 




jijisec: |0256C |0354il 


39 



Fig. 27* — “moo.” Spoken by M.B. 






40 


SPEECH AND HEARING 



Fig. 28. — Spoken by M.B. 








42 


SPEECH AND HEARING 



Fig. 30. — "cHA.” Spoken by M . A , 








.03 sec. 


44 


SPEECH AND HEARING 



Fig. 32. — '“sA.** Spoken by M.B, 













TYPICAL SPEECH WAVES 


45 

obtained from them would be the same and the|ear under 
most circumstances would identify them as the same sound. 

In Fig. 37 the graphs of four vowel sounds are shown. 
The first two correspond to ''a’' as in '‘father” but pro- 
nounced at different pitches^ the first by a man and the second 
by a woman. The wave pictures of these sounds are quite 

“ — w 

OO AS IN POOL. 


V’as in tone. 



|0.20SEC. I0.2ISEC. 


jO.WSEC. |O.I5SEC, 



“a” as in talk. 


[0.25 SEC. 


I0.26SEC. 


“a” 


as in father. 


I0.24SEC. 


10.25 SEC. 






A AS IN TAPE. 


I0.I7SEC. 


0.I8SEC. 




E AS IN TEEM. 


Fig. 33. — Long Vowels. 


different yet the ear will identify both of them as the vowel 
"a^^ more than 99 per cent of the time,. The first and third 
pictures and the second and fourth pictures look much more 
alike yet they are never confused by the ear. It is true that 
the ear recognizes some similarity between the first and third 
wave forms; they are both male voices and have the same 



SPEECH AND HEARING 


46 

pitch. These illustrations are sufficient to show that the 
characteristic that identifies the particular fundamental speech 
sound being spoken is not determined entirely by the form of 
the speech wave. 

Theories of V owel Production 

The question therefore arises as to what characteristics of 
a speech sound differentiate it from another speech sound. 

|0.24SEC. |0.25SEC. 


OJOSEC 


O.HSEC. 




AS IN TON. 


0.08SEC 


|0.07$£C. ^ 


V AS IN TAP. 


tOJOSEC. 


0,(1 




AS IN TEN, 


I j. 09 SEC. . juiUDC-WY 


0.10SECa,*» 


AS IN TIP. 


Fig. 34. — Semi-Vowels. 


It is evident from the preceding paragraphs that the pitch 
or the wave ^orm of the vowel is not its distinguishing feature. 
Two theories of vowel production have been advanced, namely, 
the harmonic or steady state theory and the inharmonic or 
transient theory. In spite of the fact that Helmholtz ^ showed 

‘Helmholtz, "Sensation of Tone." 


-4 



THEORIES OF VOWEL PRODUCTION 


47 


that these two theories were different only in the point of view 
and the method of representing the same mechanism of vowel 
production, we still have advocates of the two theories. 

The harmonic theory was first advocated by Wheatstone in 
1837. According to this theory, the vocal cords generate a 
complex wave having a fundamental and a large number of 
harmonics. The component frequencies are all exact mul- 
tiples of the fundamental. As described under the paragraph 

|o.iosEC. |ojisEC. Vasin LEE. 



‘*r”as in pert. 



'*m”as in moo. 



“n^as in noo. 


I0.I6SEC. 


10.17 SEC. 




‘'ng*’ as in ngoo. 


Fig. 35. — Short Vowels. 


on the mechanism of speaking, when these waves pass through 
the throat, the mouth, and the nasal cavities, those frequencies 
near the resonant frequencies of these cavities are radiated 
into the air very much magnified, the amount depending upon 
the damping constant of the cavity. These reinforced fre^ 
quency regions determine the vowel quality. 

, According to the inharmonic theory of Willis (1829) and 
;perman and now advocated by Scripture, the vocal cords act 



48 


SPEECH AND HEARING 


only as an agent for exciting the transient frequencies which 
are characteristic of the vocal cavities. A puff of air from 
the glottis sets the air in these cavities into vibration. This 
vibration soon diminishes until it is started anew by a second 




Figure 36. 


puff. According to this theory, the puffs do not necessarily 
follow each other periodically and hence the name “inhar- 
monic.” However, it is hard to see how the physical mechan- 
ism in the throat can produce anything but fairly regular 
puffs since these are controlled by the elastic properties of the 
vocal cords and the two resonant columns of air on either side 
of them. An examination of the records of speech sounds- 
shows that this is true. The different waves succeed each 
other quite regularly. On the other hand this examination 
also supports the view that these regular puffs do excite the 
transients of the mouth and throat cavities, for the ampli- 
tudes are large at the beginning of the wave and gradually 
die away toward the end. This is shown on the records 
and particularly on the three records in Figs. 21, 24, and 25. 
When the pitch is high, the natural vibrations do not have 
time to die down before another pulse sets them going again. 





THEORIES OF VOWEL PRODUCTION 


49 


This is illustrated in the second and fourth pictures of Fig. 37. 

It is evident then that in this theory, as well as in the previous 
one, the vowel quality is dependent upon the natural fre- > 
quencies and damping of the vocal cavities. 

The difference in the two theories is not, as some suppose, 
a difference in the conception of what is going on while the 
vowel sounds are being produced, but in the method of repre- 
senting or describing the motions in definite physical terms. 
The second point of view enables one to visualize in a more 
direct way what is taking place and consequently is of greater 


''AH" AS 
IN FATHER 



\mmm 



"ah” AS 
IN FATHER 


PITCH ~^229 CI^EMALE) 


I io.i4S£c. ’ ' 0^^ 

PITCH— *-110 (MALE) 


"r’As 

IN TIP 




r'^j 


PITCH— *-268 (FEMALE) 

Figure 37 . 


''0”AS 
IN TON 


value to the phonetician interested in the mechanism of speech 
production. It probably enables one better to grasp the 
fundamental characteristic differences between the vowels. 

The first point of view is probably more useful to the 
engineer who is interested in designing telephone systems to 
properly transmit speech. The separation of the speech into 
its component frequencies makes it possible to see quickly 


50 


SPEECH AND HEARING 


which frequencies must be transmitted by the system to 
completely carry all the characteristics of speech. A numerical 
example may help to make this clear. If the force which is 
acting on the resonant cavity of the mouth due to the vibration 
of the vocal cords is designated by F{k) for the ^th com- 
ponent, and /o is a natural frequency of the mouth chamber, 
/ the fundamental frequency of the sound, and A the damping 
constant, then it can be shown that as a first approximation 
the amplitude of the kth. component of the pressure wave 
radiated into the air from the mouth is 



where C is a constant which determines the scale used for 
representing the amplitudes. The function F{k) varies from 
sound to sound and with different individuals, but for pur- 
poses of illustration it will be assumed that 

m = ( 2 ) 

This assumption seems to give results which correspond 
roughly to the experimental results obtained with a typical 
speaker. Typical values for A and /o for the sound “a” as 
in “father” are 500 and 900, respectively. If the sound is 
spoken with the fundamental f = 125, then the amplitudes 
computed from this formula are shown in the top chart of 
Fig. 38. When the same sound is pronounced at a pitch 
corresponding to / = 250, the amplitudes are as shown in the 
bottom chart of this figure. Such representations of the 
relative amplitudes of the dijfferent frequency components 
are called acoustic spectra. 

According to the inharmonic theory, it is sufficient to say 
that the sound “a” as in “father” is characterized by a 
resonant frequenty of 900 and a damping constant of 500 and 
that the air in the mouth cavity in this condition is set into 
vibration by puffs from the vocal cords. 



THEORIES OF VOWEL PRODUCTION 


5 ^ 


According to the harmonic theory, although necessary to 
give these two numbers as characteristics of the sound, it is 
necessary in addition to specify the kind of exciting force that 
gives the values of F(k). 

The acoustic spectra of the most important vowel sounds 
are shown in Figs. 39 and 40. These spectra were obtained 
from typical wave pictures taken with the high quality oscillo- 
graph. Only the steady-state part of the wave was analyzed.’ 

Maxima similar to those in the two calculated cases are 
plainly evident in these charts. Those spectra shown in 



Figure 38. 

Fig. 39 have one principal region of resonance with indications 
of one or more regions of less importance while those in Fig. 40 
have two principal regions of resonance with other smaller 
ones. It is well to emphasize here the fact that these charts 
represent the results obtained with typical voices. Wheu the 
records of several speakers are analyzed, quite different acoustic 
spectra are obtained, but in general the regions of maximum 
amplitude are approximately the same. 

In order to show the effect of pitch upon the acoustic 
spectra of vowel sounds, an analysis was made of vowels 
intoned at pitches corresponding to the notes of the major 

^ These acoustic spectra were computed by W. Koenig of the American Telephone 
and Telegraph Company. 




5 ^ 


SPEECH AND HEARING 


chord, namely, at frequencies 128, 160, 192, and 256. The 
resulting spectra for e and a are shown in Figs. 41 and 42. 
It will be noticed that for the sound e the frequency regions 
300 and 2300 cycles and for the sound a the regions 
1900 cycles are magnified. For obtaining an analysis of these 
sounds they were recorded on phonograph records by the new 



FREQUeNCY 


Figure 39. 


electrical process and then analyzed by means of the electrical 
harmonic analyzer which is described in Part II, Chapter 1 . 
For making the analyses^ two electromagnetic reproducers 
were arranged to play simultaneously on the same record. A 
key in the electrical circuit was arranged to switch from one 



GENERAL CHARACTERISTICS OF SPEECH 


reproducer to the other. One of the reproducers was returned 
to the beginning of the cut while the other was playingj thus 
making it possible to keep a continuous tone on the input of 
the analyzer. By means of this arrangement, it was possible 
to keep the vowel sounding for the five-minute period required 
to make the analysis. 





2000 2500 3000 

FREQUENCY/ 

Figure 40. 


3500 4000 


General Characteristics of Speech 

The pitch of the voice when speaking the vowels varies 
with different individuals, corresponding to about 9° eye es 
per second for a very deep-voiced man and to about 300 cycles 
per second for a high shrill-voiced woman. The average 




54 


SPEECH AND HEARING 


pitch used by a woman is near middle C or 256 cycles per 
second. The oscillograph records show that there is usually, 
although not always, a rise in pitch as the sound progresses. 
Speaking concerning the general characteristics as deduced 
from these records, Crandall ^ says: 

“Consider now the general properties of the spoken vowel sound. 



) 





Figure 41. 


as deduced from these records. First there is a period of rapid 
growth in amplitude, lasting about 0.04 second, during which all 
components are quickly produced, and rise nearly to maximum 
amplitude; second, the middle period, the characteristics of which 
have been noted, lasting about 0.165 second, followed by the period 

^ BeU System Technical Journal^ October, 1925. 




GENERAL CHARACTERISTICS OF SPEECH 


55 


of gradual decay lasting about 0.09 second, bringing the total length 
to approximately 0.295 second. There is a tendency to short 
duration among the 'short’ vowels (e.g., short o, e, i) and a tendency 
to longer records among the broader sounds, as might be expected. 

"The behavior of the fundamental frequency (or 'cord tone’) 
during the course of the record will follow normal or individual 
characteristics as has been described. 






0 500 1000 1500 2000 2500 3000 3500 4000 4500 


FREQUENCY 


Figure 42. 


"The low frequency characteristic appears early, usually before 
the fourth cycle (for men) or before the seventh (for women) and 
normally is in harmonic relation with the fundamental.: In the 
eleven pure vowel sounds this point was examined at 264 locations 
in 88 records with the result that the harmonic relation 'obtained 
in at least 214 cases, On the other hand the norrnal behavior of 





TABLE II 


56 


SPEECH AND HEARING 



Note i — B oth of these sets of frequencies must be characteristic of at. 

Note 2— The high frequency characteristics are less dehnitelr located, for short (f, than for any other doubly resonant vowel sound, 
frequency ^uS^ecoStS^^^ above define a band of frequencies centered about 2400 cycles within which the characteristic high 



GENERAL CHARACTERISTICS OF SPEECH 


57 

the amplitude of the low frequency characteristic suggests the decay 
of a transient oscillation during each fundamental cycle— this effect 
being noticeable in at least 64 of the 88 pure vowel records. This 
transient effect was also noticeable in 13 of the 16 records of ar and er 
where the harmonic effect was not so noticeable^ The appearance 
of the transient effect depends to some extent on the relative fre- 
quencies of the fundamental and the characteristic; where the 
fundamental period is short (as often in the case of the women’s 
records) there is not sufficient time for decay of the characteristic 
tone before it receives a new impetus In the next cycle of the funda- 
mental. 

'‘As noted above, all the records contain high frequency vibrations 
which are of such amplitude that they suggest characteristic fre- 
quencies. A general mean of these frequencies would be in the 
neighborhood of 3200 cycles, and in the case of two records by speaker 
FC (Group I and Group XIII) the frequency rises to about 5000 
cycles. Recalling the usual classification of the vowel sounds into 
two groups — (i) those of 'single’ resonance, placed on the left leg 
of the triangle, and (2) those of 'double’ resonance placed on the 
right leg of the triangle — there are some differences in the behavior 
of the high frequency components which can be related to these 
broad classes. In the sounds of the first class the high frequency 
component is usually small in amplitude, more subject to individual 
bias in its frecjuency, and may or may not build up in amplitude 
as early as the low frequency characteristic. In the sounds of the 
second class the high frequency characteristic is usually prominent 
from the start and builds up very rapidly; while there is less variation 
in its frequency with the individual speaker. In sounds of the first 
class there Is no decided ssuggestion of a transient in the high fre- 
quency while in sounds of the second class the transient effect ‘is 
pronounced. 

"With these considerations In mind there is presented in Table II 
a summary of the data obtained from this preliminary examination 
of the vowel records. The mean duration time, and its subdivisions, 
are shown in the second column for each pure vowel sound, with 
mean duration only for the sounds ar (Group VII) and er (Group X). 
The fundamental and characteristic frequencies of each sound are 
shown in the three columns headed 'Mean Fundamental/ 'Mean 
Low Characteristic Frequency’ and 'Mean High Characteristic Fre- 
quency’ respectively. Each mean is taken from four records. The two 

columns headed 'Scattered Low Frequency’ and ' Scattered High Fre- 
es 

^ These are very apt to be diphthongal in character which may account for this 
lack of harmonic effect. 



58 


SPEECH AND HEARING 


quency’ contain mean values of additional components, occurring In 
one or more records, in certain frequency ranges, the number of records 
in which such components are noted being shown in parentheses 
following the mean. The table illustrates and emphasizes many 
points which have been brought out in the preceding discussion, 
particularly the closeness with which the high frequency charac- 
teristics are defined in the vowels of the second or ‘doubly-resonant’ 
class.” 

After considering the work of Stumpf, Miller, Paget, and 
Crandall, Table III was constructed which gives the charac- 

TABLE III 


Characteristic Frequency of the Vowel Sounds 


Speech Sound 

Low Frequency 

High Frequency 

u (pool) 

400 

800 

u (put) 

475 

1000 

0 (tone) 

500 

850 

a (talk) 

600 

950 

0 (ton) 

yoo 

1150 

a (father) 

8a5 

1200 

a (tap) 

750 

1800 

e (ten) 

550 

, 1900 

er (pert) 

500 

1500 

a (tape) 

550 

2100 

i (tip) 

450 

2200 

e (team) 

375 

2400 


teristic frequency regions of the vowel .sounds. Considerable 
variations from these average values are obtained for different 
speakers or for the same speaker at different times. The 
first six of these vowel sounds are frequently referred to as 
singly resonant and the last six as doubly resonant. In this 
table two characteristic frequencies are given for all of these 
sounds. The intensities of the components in the character- 
istic high frequency range, however, are very much weaker 
for, the sounds in the first than for those in the second' group. 
For this reason the earlier experimenters did not detect them. 

In a similar way the records for r, 1, ng, m, and n w§rq 




general characteristics of speech 


59 


examined. All of these sounds seemed to have three charac- 
teristic frequency regions of resonance^ the third being prob- 
ably due to the intense vibrations of the nasal cavities for 
these sounds. Table IV gives approximately the resonant 
frequencies as deduced from Crandall’s and Paget’s work. 

Sixteen consonant sounds were also studied by means of 
these records. A summary of the results as taken from 
Crandall’s article is as follows: 

‘‘B/P. — (See Fig. 2.2.) Both Paget and Miller have noted the 
essential impulsive quality of these sounds, and have produced them 


TABLE IV 


Sound 

Throat Resonance 

Nasal Resonance 

Mouth Resonance 

r 

500-700 

1000-1600 

1 800-2400 

1 

250-400 

600 

2000-3000 

n 

200-250 

600 

1400-2000 

ng 

200-250 

600 

2300-2600 

m 

250-300 

600 

900-1700 


by sudden closing and opening of the mouth of a resonator. Paget 
considers p to be the more suddenly released, i.e,, to have the steeper 
wave-front. From the records this is not evident; following the 
voicing period, the b would seem to be more suddenly produced, as 
judged by the growth in amplitude of the 'a' sound following. 

“D/T. — (See Figs. 22, 23, and 24.) For these (see either Table V 
or the records themselves) we note a high-frequency characteristic 
of about 4000 cycles. Paget observed 'an upper resonance 5 to 8 
semi-tones higher than that of the associated vowel, and a low 
resonance of about 362.’ We note in the records a low frequency 
of the order of 500 in the case of d. Paget notes a 'greater amplitude 
in t due to higher air pressure’ and the records show a greater ampli- 
tude for the high frequency in the case of t, except right at the 
transition point, where d shows the high frequency of large amplitude. 
No conclusion can be given as to relative steepness of wave-front, 
d vs. t, because in both cases we note for speaker MB a steeper 
wave-front than for MA. The difference between d and t may 
depend entirely on the voicing and on the complicated phenomena 
at the transition point. 


Group XVI — Six Stop Consonants j- Transitional dth/th 


6o 


SPEECH AND HEARING 


Vowel 

Fundamental 

Near 

End 

v-i j>. 
M 0 

III 

II4 

M q 

cl CO 

111 

11 2 

00 VO 

VO 

w 0 

M M 

no 

107 

Near 

Start 

S'S 

8 S' 

M 

CO c^ 
0 ►-< 

T 0 

0 q 

oil 

10 1 

ON ^ 

q q 

■T On 

0 0 

M W 

0 CO 

M M 

Transitional Characteristics 

First 

Cycle 

Short 

yes 

yes 

yes 

yes 

yes 

yes 


yes 

yes 


yes 

yes 

No. of 
Cycles 

M M 

M M 

CO (S 

•T cl 

CO cl 

4 

4 

d Tt 

M „ 

High 
Fre- 
quency 
(Note 6) 

88 

(S 

8 8 

VO VO 

CO CO 

8 8 

CO c\ 

C< CO 

8 § 

cl 0 

CO CO 

§ 8 
0 'O 
CO CO 

8 § 
00 0 

CO 

8 8 
0 VO 

CO Cl 

88 
d d 

CO CO 

Low 

Fre- 

quency 

8 8 
i>-r^ 

006 

CX)OI 

8 8 
VO VO 

•: 8 
• ON 

8, 8 
'-n VO 

8 8 

cl CO 

M l-l 

600 

600 


Consonant Characteristics 

Mid Portion to End 

High 

Frequency 

none 

none 

<s 

(L) 

si 8 

CO Ci'co 

CO 

u 

4 .i 

8^8 
CO G-VO 
CO CO 

<U 

sl 8 

r}- CO 

8t=§ 

CO 0 

s'ls'' 

voO ^ 

8-8 
0 ^C1 
<u 

PI 

4200 
(Note i) 
2700 

none 

none 

Voicing 

0 Tt 

00 00 

0^ cP 
On on 

unvoiced 

(one 60-cycle 
vibration) 

CO VO 

^ ON 

tH 

On 00' 
On 

(one 1 00-cycle 
vibration) 
unvoiced 

Cl 0 

VN ON 

Cl^ 

xp ^ 
00 On 

unvoiced 

unvoiced 

95.189 

100,200 

unvoiced 
(one loo-cycle 
vibration) 

[ Near Start 

High 

frequency 

none 

none 

none 

none 

none 

none 

none 

none 

none 

none 

none 

none 

fu 

S." 

none 

none 

Voicing 
(Funda- 
mental and 
Harmonics) 

0 Q 
CO 0 

M C) 

M 

unvoiced 

(one 60-cycle 
vibration) 

0 VO 

CO ON 

o' ocT 

ON ON 

unvoiced 

unvoiced 

8 8 
CO CO 

§' § 

8'' 8" 
M w 

unvoiced 

unvoiced 

VO Q 

VO 0 

CO o' 
00 0 

»-< 

unvoiced 

unvoiced 

Duration 

On 

w »-t 

0 q 

CO 0 

VO 

q q 

0 

W M 

CO 

0 q 

q 00 

d H 

8 8 

<u 

•rt 

(L> 

CL 

CA) 

MA 

MB 

MA 

MB 

MA 

MB 

MA 

MB 

MA 

MB 

W 

MA 

MB 

<im 

XJ 

a 

0 

0 

CO 

cd ti 
MrO 

pa 

pa 

da 

da 

ta 

ta 

rt nJ 

bA bfl 

ka 

ka 

dtha 

dtha 

rt 

Plate 

No. 

ON 0 
<S CO 
M M 

M C? 

CO CO 

M M 

CO -T 

CO CO 

Vn VO 
CO CO 

H H 

00 

CO CO 

M H 

ON Q 

CO 

H M 

M H 

H M 


4 


Note — A trace of these at be^nning of the early fundamental cycles. Note q. — One faint transient. Note 3 — ^Transients; longer 
for ta than for da. Note 4 — One transient. Note 5 — Irregular transients. Note 6 — Possibly due in some cases to the a sound. 



GENERAL CHARACTERISTICS OF SPEECH 


6i 


''G/K. — (See Fig. 29.) k shows the characteristic transients 
(1500, 4000; Table V, notes 4 and 5) to much more pronounced 
degree than g. From the records it would seem that g, in addition 
to the voicing, disclosed a steeper wave-front, the four transitional 
cycles required for k emphasizing this point. No other generaliza- 
tions seem warranted, on account of the complicated series of events 
recorded. These sounds are treated at length by Paget who observes 
considerable variation in their resonant ranges, depending on the 
associated vowel. It will be noted, however, that in these four 
records, particularly consonant characteristics are persistent and of 
large amplitude before the vowel sound begins to appear. 

“DTH/TH. — The high frequencies (2600, 3000, 3200) cul- 
minating at the transition point seem to be the key to these records. 
They are more persistent for dth, while th appears to show the 
steeper wave-front. Paget states that ‘in 6 [dth] the middle reso- 
nance is overblown, , . . louder than the corresponding resonance 
in 0 [th].’ He gives also an ‘upper sibilant of 3444-5950’ louder 
for dth than th, and ‘difficult to identify.’ It will be noted that 
in one record for dth there is during the voicing period a faint high 
frequency which has been set down in Table V as 4000 cycles. This 
faint ‘sibilant’ (which may always be audible though it fail to be 
recorded) establishes a certain kinship between these two sounds 
and those following (the fricative consonants) which are rich in 
sibilant sounds. 

“V/F. — V. shows a pronounced voicing, and previously noted, 
a less, prominent high-frequency component than its partner f, 
or any of the other fricative consonants. • Comparing v/f with 
dth/th it seems from the records that the former pair are of higher 
frequency (particularly f) and that for v/f as a unit the higher 
frequency characteristic is more pronounced; just the opposite con- 
clusion to that reached by Paget, f may indeed differ more from v 
than V from dth, thus raising difficulties of classification both physic- 
ally and phonetically, which cannot be resolved on the basis of the 
few records available. The exceedingly fine distinction between 
the sounds v and dth could be no more strikingly shown than it is 
in the records given, for both speakers. 

“J/CH. — (See Fig. 30.) Some of the recorded phenomena of 
this pair suggest correspondences between them and the pair g/k; 
but the pair j/ch shows a higher frequency characteristic during the 
important mid-portion of its history. Of the pair, ch seems to show 
the steeper wave-front, that is, the more raoid transition to the 
vowel sound. 

“ZH/SH. — ^With this pair we pass to the field of pure sibilants, 
in which there Is no evidence of impulsive action or steepness of 



TABLE VI 

Group XVII — Fricative Coksonants 


62 


SPEECH AND HEARING 




GENEIL^L CHARACTERISTICS OF SPEECH 


wave-front. The action seems to be that in the vcjiced sound there 
is, in addition to the presence of the fundamental tune, a breaking 
up of the characteristic high frequency wave-train into discrete 
units corresponding to the fundamental tone, whereas in the unvoiced 
sound the high frequency characteristic is continuous, though 
irregular. Thus noting that the characteristic frequency is of jccc to 
4600 cycles the outstanding phenomena of zh, sh are well detined. 
In addition to frequencies of 2048-3249 noted by Paget, he gives a 
‘pronounced middle resonance of 1625-2048.’ This latter observa- 
tion of Paget’s may correspond to the 1800-2000 frequency in the 
records of MB in the transition region, but this component does not 
seem to be prominent in the records. 

Z/S, — (See Figs. 31 and 32.) The general properties of these 
sounds can be inferred from the discussion of the preceding pair 
(zh/sh), adding only the fact that their principal characteristic is 
of much higher frequency. From Table VI we note a range of 
4200-8000 cycles; Paget gives ‘a characteristic upper resonance of 
5790-6886.’ Paget also gives *a middle resonance of 1084-2298.’ 
The records do not show as low a range of characteristic frequencies 
unless it be the frequency range 2200-2800 (see note i, Table M), 
within which fall certain vibrations occurring in the early parts of 
the fundamental cycles of the voiced sounds zh and z. The true 
s sound is, as Paget has stated, ‘a relatively complex hiss’ and this 
is true’ of sh as well. And to complete the record, we must observe 
that zh and z are even more complex, if possible, and thus not inap- 
propriate examples of the sounds of speech with which to conclude 
this survey.” 



CHAPTER III 


Speech Power 

Definition of Instantaneous, Average, Mean, Syllabic, Phonetic, 
and Peak Speech Powers 

For purposes of engineering telephone transmission sys- 
tems, it is desirable to know both the acoustic and electrical 
power of the speech being transmitted. If this power becomes ^ 
too small, it is masked by extraneous noise. If it becomes too 
high, parts of the transmitting apparatus become overloaded; 
that is, they fail to properly transmit the speech. Inasmuch 
as speech power is so variable, it has been found convenient 
to use several quantities in describing it, such as the instan- 
taneous, the average, the mean, the syllabic, the phonetic, 
and the peak speech powers. 

The instantaneous power is the rate that the sound energy 
is being radiated at any instant, and it frequently rises to 
values h gher than one hundred times the average power. 

The average speech power is the total speech sound energy 
radiated while a person is speaking divided by the time interval 
during which he speaks. 

It is of interest to know the slow variations of the speech 
power such as would be recorded by the usual type of volt- 
meter or ammeter when placed in the telephone circuit over 
which speech is being transmitted. For describing these 
variations the notion of mean power is useful. The mean 
power is a function of the time which shows the slow variations 
of the speech power without showing the periodic fluctuations 
of the wave. It can be determined from the wave-form 
pictures such as are shown in Chapter II. To do this the 

64' 



DEFINITIONS OF INSTANTANEOUS SPEECH POWERS 65 

average power over each one-hundredth second is calculated. 
Then a curve is plotted using these short interval averages as 
ordinates and the corresponding time values as abscissas. 
Such a curve for a syllable, a word, a sentence, or a series of 
sentences gives^ a mean speech power curve and indicates to 
the eye the variations in loudness as sensed by the ear. 

The syllabic speech power is useful in describing the power 
used in various syllables. Inasmuch as it is difficult to deter- 
mine the exact beginning or ending of a syllable, the maximum 
mean power attained while it is spoken is taken as a measure 
of the syllabic power. 

The phonetic speech power is the maximum value of the 
mean power while one of the fundamental vowel or consonant 
sounds is being spoken. It is useful in comparing the relative 
amounts of power used in producing the different phonetic 
sounds. The syllabic power is usually the phonetic power of 
the vowel in the syllable. 

The peak speech power is the maximum value of instan- 
taneous power during the interval considered. In Fig. 43 is 
shown a chart ^ which illustrates these kinds of speech power 
for the word quite.” The instantaneous power varies from 
zero to high values for each cycle of the -Wave. The mean 
power slowly rises to about 40 microwatts, which is the syllabic 
power; it then decreases to zero. The peak power rises to 
1500 microwatts. The phonetic power of the sound I is 
40 microwatts or the same as the syllabic power. 

Speech Power Definitions 
All Measured in Microwatts 

Instantaneous S.P. — Rate sound energy is being radiated at 

any instant. 

Average S.P. — Total speech energy radiated over any 

• period divided by the length of the 
period. 

^ Taken from article by C* F. Sacia, “Speech Power and Energy/* Bell System 
Technical Joumaly October, 19^25. 



66 


SPEECH AND HEARING 


Mean S.P. 
Syllabic S.P. 
Phonetic S.P. 
Peak S.P. 


— ^Average S.P. over each one-hundredth of 
a second period. 

— Maximum value of mean S.P. of one 
syllable. 

— Maximum value of mean S.P. of one 
fundamental vowel or consonant. 

— Maximum value of instantaneous power 
over the interval considered. 



^ ^ TIME SCALE 

^ 0.05 0.10 0.15 0,20 SEC. 

Fig. 43. Enlarged Copy of Original Oscillogram of the Word ** quite.** 


Average Speech Power 

The speech power may be most conveniently measured by 
means of a calibrated condenser transmitter. A description 
of the method of calibrating a condenser transmitter by means 
of a thermophone is given in Appendix A. From the voltage 
developed at its terminals, the pressure on its diaphragm can 



DEFINITIONS OF INSTANTANEOUS SPEECH POWERS 67 


be calculated. It may be shown (see Appendix B) that during 
the passage of a sound wave through any medium the power J 
in ergs per second radiated through a unit area of the wave 
front is given by 


where p is the r.m.s value of the pressure variation expressed 
in bars (dynes per square centimeter) and r is the radiation 
resistance of the medium. The value of the radiation resistance 
is equal to the velocity of propagation of the sound wave 
multiplied by the density of the medium in which it is trav- 
elling. When r is expressed in c.g.s. units its value for air 
at 2,0° is 41.5; for water it is 143^,000. It is evident then 
that a sound wave in water must produce fifty-eight times 
as much pressure variation to carry the same sound energy 
as a similar sound wave in air. For speech waves in air at 
the intensity J in microwatts per square centimeter is 
given by 



(2) 


where p is expressed in bars. 

The power passing through a square centimeter at a con- 
venient distance from the mouth can thus be determined. 
Multiplying this by the area of a hemisphere having a radius 
equal to this "distance from the mouth gives approximately 
the total power of the speech sounds. 

By taking the average speech power for a number of 
individuals talking in their usual conversational manner, it 
has been found that the average speech power for American 
speech is approximately 10 microwatts. If the silent intervals 
during conversation are^- excluded, this average is increased 
approximately 50 per cent. To carry this amount of power 
the air particles near the mouth vibrate through a distance of 
the order of roir millimeter. When this amount of sound 
energy is received directly into the ear, it seems rather large 
due to the large excitation it produces on the auditory sense. 



68 


SPEECH AND HEARING 


However^ it is really very small in comparison with the other 
powers ordinarily encountered. For example^ it takes power 
equivalent to that produced by more than one million voices 
to light an ordinary incandescent lamp. It is therefore 
evident that the electrical currents used to transmit speech 
are of a different order of magnitude than those used to 
transmit power for lighting and heating purposes. It is only 
in some of the larger broadcasting stations that electrical 
speech currents are comparable in size with those used in 
power work. 

When one talks about as loudly as possible the average 
speech power increases approximately to looo microwatts. 
When one talks in as weak a voice as possible without whisper- 
ing it drops to .1 microwatt. A very soft whisper is about at 
.001 microwatt. 

Tests made with a large number of speakers talking into 
the transmitter of a commercial telephone system showed that 
variations in the characteristic average speech power used by 
different individuals .are approximately as shown in Table VII. 
The values ^ are given in fractions or multiples of the average 
speech power. These figures are based upon the electrical 
powers flowing from the terminals of the transmitter while it 
is being agitated by the various speakers. 


TABLE VH 

Relative Speech Powers Used by Individuals in Conversation 


Region of average 
speech power 

below 

A ¥ 

i-i 

T 


1-2 

2-4 

1 

4-8 

above 8 

Per cent of speakers 

7 

9 

14 

18 

22 

17 

9 

4 

0 


Decibel — Sensation Unit — Sensation Level 

When discussing problems involving mainly differences in 
powers^ either electric or acoustic, it is convenient to use a 

^ These values were obtained by L. J. Sivian of Bell Telephone Laboratories. 



DECIBEL sensation UNIT 


69 


logarithmic unit called the decibel and recently adopted for 
use In telephone engineering. If 7 and Jo are the two differ- 
ent amounts of power being compared, then the difference in 
power level a expressed In bels is given by the equation 


= log 


10 


£ 

Jo 


(3) 


A unit which is one-tenth of the bel is a more convenient size 
for most practical work and is the one used in telephone, engi- 
neering in this country. It is called the “decibel” and is 
usually designated by the abbreviation “db.” For example, 
in the discussion on speech power, if the average speech power 
is taken as a zero level, that is, taken as the comparison level, 
then the level of very ll^ud speech would be + 20 db, of weak 
speech — 20 db, and of a soft whisper — 40 db. The range 
of levels from a soft whisper to very loud speech is 60 db. 
From the figures shown In Table VII it is seen that using this 
unit, the range of average powers used in conversation by 93 
per cent of speakers is about 21 db. 

This unit is also used by otologists, psychologists, and 
physiologists in describing the magnitude of sounds being lis- 
tened to by the ear. It Is a well-known psychological law 
that equal steps on such a logarithmic scale sound approxi- 
mately like equal loudness steps. A change of the power 
level of a sound by one decibel is approximately the smallest 
that the ear can detect. When this unit is used in this con- 
nection the term “sensation unit” has come into use. How- 
ever, since the telephone companies of both Europe and 
America have adopted the names bel and decibel, it seems 
desirable that these names be used universally if possible. 
Unless otherwise stated, the decibel will be used in this book 
for representing levels of intensity. 

The sensation level of any sound reaching the ear is the 
number of sensation units it Is above the threshold level for 
audition. If the lips of an average speaker are held within 
half an inch of the ear of a person having normal hearing, 
the sensation level of the speech received is about 100 db. 



70 


SPEECH AND HEARING 


The intensity of the speech or the speech power per unit area ^ 
which actuates the ear under such circumstances is called the 
initial speech intensity. Although, as mentioned before, the 
average speech power is only one-millionth that used for an 
electric light, it can still be attenuated times or to one 
ten-billionth before it ceases to affect the auditory sense. This 
matter will receive further consideration in a later chapter. 

Power in the Fundamental Speech Sounds 

In the course of conversation the fundamental vowel and 
consonant sounds are produced with varying degrees of power 
depending upon their position in the sentence and the emphasis 
desired. In spite of this variation some of the speech sounds .. 

! are always much more powerful than others and it is interesting 
to know typical values used in conversation. ^ 

The phonetic and peak powers of the individual speech 
sounds can be obtained by means of a calibrated condenser 
transmitter. Sacia and Beck ^ have obtained in this way 
from measurements of oscillograms some values for most of 
the speech sounds. Although sixteen people were used in 
obtaining these data and the sounds made in various combina- 
tions, they are still insufficient to give average values which can 
be said to be typical. However, the values obtained do give 
a good notion of the range of powers involved. These data 
are given in Table VIII under the columns headed ''Phonetic 
Power and "Peak Power/’ The figures are in microwatts i 
power radiating from the mouth of the speaker. i 

As a check against the results obtained by this method ^ 
there is given in the last column of Table VIII a set of figures 
which are the average values of two other methods used for 
determining phonetic power. These other methods are 
described in Part IV, Chapter IV, and will be only briefly 
outlined here. The first method uses a form of articulation 
test, taking advantage of the fact that as the speech level is 
decreased the weaker sounds will be the first to be misunder- 


^ Bell System Technical Journal^ 


POWER IN THE FUNDAMENTAL SPEECH SOUND 71 

stood since the ear fails to hear their essential characteristics 
first. By recording the amount of attenuation necessary to 

TABLE VIII 

Power in Microwatts in the Fundamental Speech Sounds 


Phonetic 

Sound 

Key 

Word 

Phonetic Power 

Peak Power 

Calculations 
from Threshold 
and Articulation 
Measurements 

Average 

Maximum 

Average 

Maximum 

u 

tool 

23 

60 

235 

700 

38 

u 

took 

16 

100 

470 

890 

50 

0 

tone 

^5 

80 

435 

1300 

74 

0' 

talk 

45 

120 

615 

1500 

87 

0 

ton 

24 

no 

450 

1700 

83 

a 

top 

41 

120 

700 

1600 

68 

a 

tap 

25 

90 

650 

1800 

57 

e 

ten 

11 

90 

500 

1700 

34 

a 

tape 

n 

60 

5^5 

1700 

35 

i 

tip 

20 

50 

350 

1300 

22 

e 

team 

20 

80 

310 

1500 

16 

m 

me 

1.8 

U 

no 

200 

2,9 

n 

no 

2.1 

18 

47 

7 ° 

4.1 


ring 

•3 

3-6 

97 

170 

12 

1 

let 

•3 

9.6 

130 

230 

18 

r 

err 

16 

30 

■200 

600 

33 

V 

vat 

•03 

2.4 

^5 

30 

1 .0 

f 

for 

.08 

3-6 

3 

4 

1 .0 

z 

zip 

•7 

7.2 

30 

40 

1.2 

s 

sit 

•9 

8.7 

. 30 

55 

•9 

th 

thin 



I 

I 

.3 

th 

that 



0 

10 

2.3 

zh 

azure 

* 


y 

40 

55 


sh 

shot 

1.8 

6.0 

no 

130 

II 

b 

bat 



7 

7 

1 .1 

P 

pat 



6 

7 

1 .0 

d 

dot 

.08 

2.9 


7 

i '7 

t 

tap 

.1 

6.0 

16 


2.7 

j 

jot 

•5 

3 -^ 

24 

36 

4-1 

ch 

chat 

1-4 

19 

52 

60 

6.1 

g 

get 



8 

9 

3-3 

k 

kit 

•3 

4,8 

6 

9 

3-0 



72 


SPEECH AND HEARING 


. reduce each speech sound to the point where it is misunderstood 
some arbitrary per cent of the number of times uttered, it is 
possible to obtain approximately the relative power of each 
sound. 

The second method reduces each sound to the level at 
which it may no longer be heard. This is done by using a 
telephone circuit giving a very faithful reproduction of the 
speech sounds and into which is introduced suitable attenuators. 
The number of db that each sound must be attenuated to 
make it inaudible, is thus a measure of its phonetic power. 

Table IX shows the individual results of these last two 
methods and also their averages. The figures in the last 
column of Table VIII were obtained from the averages of 
these two methods by use of the ''threshold'’ or minimum 
audible sound pressures which are determined as described 
in Part III, Chapter IL Here it is shown that the minimum 
pressure at from one to three thousand cycles is approximately 
.0006 bar. Substituting this value in equation (2) gives 


(.0006)2 

415 


8.7 X io“^o. 


(4) 


The area of the hemisphere through which the sound is passing 
is approximately lo sq. cm. when the lips of a speaker are | inch 
from the ear. Consequently, the power of this minimum 
sound is approximately 87 X 10-1° microwatts. Having, 
from the “average” column of Table IX, the amounts the 
various sounds must be attenuated to reach this minimum 
level, a simple calculation gives the phonetic power of each at 
normal levels and these are the values listed in the last column 
of Table VIII. 

After considering the different sets of data. Table X was 
constructed which gives the relative amounts of power in the 
different sounds using the power in the faintest sound as a 
basis of comparison. 

It is seen that the most powerful sound is 6 (awl), and 
the faintest sound th (thin), the ratio of powers between these 
two being 680, The difference in level expressed in db cor- 



POWER IN THE FUNDAMENTAL SPEECH SOUND 73 

responding to this figure is 28, From the data available the 
indications are that in an average room in the city the noise 
is such as to raise the threshold approximately 30 db. Also 

TABLE IX 

Sensation Levels Produced by an Average Speaker for the Fundamental 

Speech Sounds 


peech Sound 

Threshold 

Articulation 

xAverage 

6 (talk) 

100.0 

100.0 

100.0 

0 (ton) 

99.6 

100.0 

99.8 

6 (tone) 

99.6 

98.9 

99-3 

I (bite) 

99-5 

100.0 

99.8 

ou (bout) 

99.2 

100.0 

99.6 

a (tap) 

99.2 

97.2 

98.2 

e (ten) 

98.4 

93-5 

95*9 

a (top) 

97-4 

100.3 

98.9 

u (took) 

97.1 

98.1 

97.6 

u (tool) 

95-9 

94-3 

9 S-I 

a (tape) 

93-3 

98.2 

99.8 

i (tip) 

92.6 

95*5 

94.0 

e (team) 

89.4 

96-3 

92.9 

r (err) 

96.0 

95 -S 

95-8 

l(let) 

i 93-5 

92.6 

93-1 

ng (ring) 

88.9 

93-8 

91.4 

sh (shot) 

88.9 

93 -^ 

91. 1 

ch (chat) 

87.2 

89.7 

88.5 

n (no) 

86.8 

86.7 

86.75 

m (me) 

85.4 

85.1 

85-3 

th (that) 

84.2 


84.2 

t (tap) 

84.1 

86.4 

85-3 

h (hat) 

83-9 

81.7 

82.8 

k (kit) 

83.8 

85*3 

84.6 

j Got) 

83-7 

89-7 

86.7 

f(for) 

83 .6 

77*7 

80.7 

g (get) 

82.9 

86.9 

84.9 

s (sit) 

82.4 

78.1 

8 o -3 

2 (zip) 

81.6 

81.6 

81.6 

V (vat) 

81 .4 

80.1 

80.8 

P (pat) 

80.6 

81.4 

81.0 

d (dot) 

78-9 

87.8 

83*4 

b (bat) 

78.8 

83-7 

81.3 

th (thin) 

78.7 

71.2 

75.0 



74 


SPEECH AND HEARING 


the sound is attenuated more than 40 db if the speaker is about 
10 feet away from the listener. Consequently^ under such 
circumstances the sound th is barely audible. 

The pure vowels are the most powerful sounds and have a 
range of intensity of 3 to i. As would be expected, the open 
vowels 6, a, o, and a have the largest phonetic powers. The 
diphthongs are not given but they have about the same power 
as the vowels which compose them. 

The semi-vowels are next to the pure vowels in phonetic 
power. Of these, n is the weakest and r the strongest. It 
is interesting to note that the unvoiced fricatives, sh and ch, 
have powers comparable to the semi-vowels. Next follow 

TABLE X s./' 

Relative Phonetic Powers of the Fundamental Speech Sounds as Produced 
BY AN Average Speaker 


6 

680 

u 

1 

310 

ch 

42 

k 

13 

a 

6qo 

i 

260 

n 

36 

V 

12 

0 

510 

e 

220 

j 

^3 

th 

II 

a 

490 

r 

210 

zh 

20 

b 

7 

6 

470 

1 

100 

z 

16 

d 

7 

u 

460 

sh 

80 

s 

16 

P 

6 

a 

370 

ng 

73 

t 

15 

f 

5 

e 

350 1 

m 

52- 

g 

15 

th 

I 


the stop and fricative consonants; z, s, t, g, v, and th having 
about the same power which is about one-fifth that of the^ 
semi-vowels and then b, d, p, and f having a slightly lower 
power. 

The syllabic power varies more with the emphasis given 
than with the vowel sound used. A vowel in an accented 
I syllable has usually three or four times as much phonetic 
power as one in an unaccented syllable. This difference is 
dependent upon the speaking habits of the individual. 

The peak power varies considerably with the type of 
voice, the values given in Table VIII being typical. For 



POWER IN THE FUNDAMENTAL SPEECH SOUND . 75 


engineering purposes^ it may be considered to be about five 
times the syllabic power. In this connection Sacia ^ says 

have become able to associate peak factors with vocal qualities 
in the following way: the voices with the higher peak factors are 
those which in the ordinary terminology are said to be ‘resonant’ 
or ‘vibrant’; they have the greater carrying power, especially over 
the telephone; they are rich in the musical sense and are, therefore, 
well suited to singing, although many such voices, unfortunately, 
are never applied to the art.” 

It is seen then from Table VIII that for an accented 
syllable the peak power frequently rises to 700 microwatts. 
For the 4 per cent of speakers who are in the class producing 
an average power of from four to eight times the, average for 
the entire group, this peak value might reach as high as 
5000 microwatts. 

From the data given it is estimated that the average pho- 
netic power in the faintest speech sound, namely, th (as in 
thin), is approximately .05 microwatt. This value, of course, 
is for one speaking with a typical average voice. For those 
speaking with softer voices a much smaller power for this 
sound would be produced. The reduction, however, would 
not be as great as for the vowel sounds. A round figure of 
about .01 microwatt probably represents the faintest sound 
and of about 5000 microwatts the peak value of the loudest 
sound that will be encountered in conversation- This repre- 
sents a range in intensity of 500,000 to i or 56 db. When deal- 
ing with only one speaker, the range of intensity of the speech 
sounds is usually between 35 and 40 db. 

All of the figures given are based upon average American 
speakers. Although no measurements have been made upon 
persons who are specially trained to speak distinctly, such as 
actors and public speakers, it is very probable that such tests 
would show that the weaker sound would be given consider- 
ably more power by such trained speakers than is ordinarily 
used by the average speaker. Due to this cause, the range 

^ Sacia, C F., “Speech Power and Energy,” Bell System Technical Journal^ 
October, 1925, 



76 


SPEECH AND HEARING 


of intensities used by such speakers would be narrower than ' 
indicated by these figures. However, the greater range of 
emphasis used by them would tend to make the necessary 
intensity range wider. 

'Relative Distribution of Speech Power into Frequency Bands 


The frequency range necessary for the faithful transmission 
of speech is of considerable importance. In Chapter II were 



FREQUENCY 
Figure 44. 


given tKe characteristic frequency regions, and in the present 
chapter the characteristic intensity regions for the various 
speech sounds are given. Figure 44 is a plot showing these 
combined characteristics of most of the fundamental sounds 
of speech. The ordinates give the sensation level of the 
principal components and the abscissas the characteristic 
frequency regions of each speech sound. When a sound has 




RELATIVE DISTRIBUTION OF SPEECH POWER 77 


several principal components the position of each Is Indicated. 
Although it cannot be claimed that this chart gives more than 
a very rough picture of the true facts, it may serve to give a 
general picture of the intensities and frequencies involved in 
the transmission of speech. 

More accurate data for the frequency-energy distribution 
for speech as a whole have been obtained by Crandall and 
MacKenzie.^ The method consisted of analyzing the speech 
waves which were impressed upon a condenser transmitter by 
using a resonant circuit to transmit narrow frequency bands of 
energy and pronouncing the separate syllables of the con- 
nected speech so slowly that the kick of a direct current gal- 
vanometer could be separately read for each syllable. The 



Fig. 45. — Circuit for Determining the Frequency-Energy Distribution in 

Speech, 


circuit used for this purpose is shown in Fig. 45. The sound 
waves which fall upon the condenser transmitter are trans- 
mitted through the three-stage amplifier and then into the 
twin single-stage amplifiers. Connected to the output of one 
of these amplifiers is the resonant circuit which limits the band 
of frequencies being transmitted. Connected to the other 
amplifier is a circuit which transmits all the frequencies. By- 
changing the tuning of the first circuit the different bands of 
speech are transmitted. When a syllable is spoken, simul- 
taneous readings are taken of both meters, one reading corre- 
sponding to the total energy of the syllable uttered, the other 
to the energy of the syllable lying within the limits of trans- 

' nyskal Reviev), March, igM, pp. aii-232. ^ 



78 


SPEECH AND HEARING 


mission of the tuned circuit. Twenty-three bands of fre- 
quencies were used in the experiment so that each syllable 
was repeated this number of times. Any variations in loud- 
ness of the syllable produced by the speaker could be detected 






0.004 








VOICE ’’a" 
MALE 


0.002 



voice”o" 

MALE 







0 

L 





L 




0,004 



voice"e" 

FEMALE 


> 



VOICE^B" 

MALE 


0.002 










J 

k 







voice"c" 

MALE 


0.002 



voice" f" 
female 








i 

f 

V 






0 1000 2000 3000 4000 0 1000 2000 3000 4000 5000 

FREQUENCY 


Fig. 46. — Analyses of Individual Voices, 


by the first meter and the necessary corrections made. The 
properties of the circuit were such that for the different fre- 
quency settings the widths of the band were not the same. 
A correction was therefore made to make the readings for 
these different widths comparable. After these corrections 
were made and the data reduced to a comparable basis, the 
relative amounts of energy in the various regions were com- 
puted. Figure 46 shows the results ^ of such an analysis for 
six voices. The curves are arranged so that energy in any 
particular frequency band is proportional to the area included 
between the two ordinates erected at the limits of this fre- 


Physical Review^ March, 1922, p. aay. 


RELATIVE DISTRIBUTION OF SPEECH POWER 


79 


quency band, the curvcj and the 'X** axis. These curves 
were obtained by each of the six speakers pronouncing the test 
sentence of fifty syllables for each of the twenty-three fre- 
quency settings, making 6900 observations. As would be 
expected, the energy-frequency distribution for the different 
voices shows characteristic differences. The curve giving the 
average of these six voices is shown in Fig. 47. 

The region containing the maximum amount of energy is 
at a frequency near that corresponding to the fundamental 
pitch ordinarily used in producing the vowel sounds. Although 
the experiments were carried to frequencies no higher than 
5000 cycles, it is well known from other sources that there is 


















— A 
























































— 

— 


























/ 



ENERGY FREQUENCY DISTRlBUTrON 

OF AVERAGE SPEECH 

„ . 3Y THE FREQUENCY 

■ - jjj^2E(n)dn 



.1 

■ 





III 

m 






ll 

■ 










r~ 





■I 

A 















ill 

■I 

H 

B 

■ 

■ 











III 

II 


B 

■ 

■ 











11 

i 

B 

■ 

la 

— 


7^ 







— 1 


FREQUENCY n 

Figure 47. 


still energy in some of the speech sounds up to as high as 
10,000 cycles. It is important to notice, that this distribution 
curve was obtained by the particular method just described 
of obtaining an average from a group of speakers producing 
a group of speech sounds. The occurrence of considerable 



8o 


SPEECH AND HEARING 


energy in any particular frequency region does not necessarily 
imply that high sound intensities have been produced in this 
region. Such high energy values may be obtained either by 
the occurrence of a large number of speech sounds having 
components of comparatively low intensities in this region or 
by the occurrence of a few sounds having components of 
comparatively high intensities in this region. For some 
engineering work it is important to distinguish between these 
two possibilities. 



CHAPTER IV 


Frequency of Occurrence of the Different 
Speech Sounds 

Words 

After learning the physical characteristics of the speech 
sounds it is natural to inquire how frequently they are used in 
conversational speech. It is evident that such knowledge 
will be useful in telephone engineering as well as in other 
fields. Godfrey Dewey ^ has made an extensive study of the 
frequency of occurrence of words, syllables, and fundamental 
vowel and consonant sounds in written material. This material 
was taken from representative sources such as modern news- 
papers, fiction, American speeches, personal correspondence, 
business correspondence, modern advertising, religious Eng- 
lish, scientific English, and American magazines. As a 
result of this study, Dewey found the loo most frequently 
occurring words to be those shown’ in Table XI. The numeral 
gives in per cent the frequency of occurrence of the word. 

The definite article, “ the” accounts for more than 7 per cent 
of all the words occurring on an average written page. Some 
words, such as winter, to-morrow, succeed, and railroads, 
which seem very familiar, occur only once in 10,000 words. 
The first 10 words given in this list account for more than 
2,5 per cent and the 100 words account for more than 50 per 
cent of all the words occurring. 

Syllable Combinations 

Dewey’s studies Indicated also that the 100 most frequently 
occurring syllables are as shown in Table XII. The figure at 

1 Dewey, Godfrey, “Relative Frequency of English Speech Sounds,” 1923, Harvard 
University Press, Cambridge, Mass. 

81 



8a 


SPEECH AND HEARING 


TABLE XI 


Relative Frequency of Occurrence of Words 


7-31 

the 

00 

not 

•31 

their 

.20 

time 

•15 

these 

3-99 

of 

.58 

at 

•30 

there 

.20 

up 

.14 

two 

3.28 

and 

•57 

this 

•30 

were 

.20 

do 

.14 

very 

2.92 

to 

•54 

are 

•30 

so 

.20 

out 

•13 

before 

2.12 

a 

•5^ 

we 

.29 

my 

.19 

can 

•13 

great 

2. II 

in 

•51 

his 

.26 

if 

•19 

than 

•13 

could 

1-34 

that 

•50 

but 

■^5 

me 

.18 

only 

•13 

such 

1. 21 

it 

•47 

they 

•25 

what 

.18 

she 

•13 

first 

1. 21 

is 

,46 

all 

•25 

would 

•17 

made 

. 12 

upon 

1. 15 

I 

•45 

or 

.24 

who 

. 16 

other 

. 12 

every 

1*03 

for 

•45 

which 

•23 

when 

.16 

into 

. 12 

how 

.84 

be 

.44 

will 

•23 

him 

.16 

men 

.12 

come 

.83 

was 

•43 

from 

.22 

them 

.16 

must 

.12 

us 

♦78 

as 

.41 

had 

.22 

her 

.16 

people 

: .12 

shall 

•77 

you 

•39 

has 

.21 

war 

.16 

said 

1 .11 

shouhl 

.72 

with 

•36 

one 

.21 

your 

.16 

may 

i .11 

then 

.68 

he 

•33 

our 

.21 

any 

•15 

man 

1 

like 

.64 

on 

•33 

an 

.21 

more 

•15 

about 

.11 

well 

.61 

have 

• 3 ^ 

been 

.21 

now 

•15 

over 

.11 

little 

.60 

by 

•32 

no 

.20 

its 

•15 

some 

, II 

say 


the left of the phonetic syllable gives in per cent the frequency 
of occurrence. 


Fundamental Sounds 

The analysis of the phonetic pronunciation of the words 
enabled Dewey to find the frequency of occurrence of each 
of the fundamental speech sounds given in Table I. His 
values are given in Table XIII. 

It is seen from this table that the sound i (as in tip) is the 
most frequently occurring phonetic sound. The sounds n, t, 
r, and o (as in ton), are the next four sounds in the order of 
their .frequency of occurrence and they account for more than 
36 per cent of all the sounds found on a written page. It is 
seen from the tables given above that a comparatively small 
part of the more common words comprise the large part of 


FUNDAMENTAL SOUNDS 83 

our ordinary speech. That this is true is emphasized by the 
following summary taken from Dewey’s book. 

General Summary 

9 words are found to form over ( ] of the total words 

12 syllables are found to form over j 25 per cent [ of the syllables 
4 sounds are found to form over I J of the sounds 

69 words are found to form over ( 'i of the words 

70 syllables are found to form over \ 50 per cent r of the syllables 

9 sounds are found to form over I J of the sounds 

732 words are found to form over f ] of the words 

339 syllables are found to form over j 75 per cent r of the syllables 
19 sounds are found to form over I J of the sounds 

1027 words occurring over ten times form 78.6 per cent of the words 
1370 syllables occurring over ten times form 93.4 per cent of the syllables 
41 + I sounds form 100 per cent of the sounds 


TABLE XII 


Relative Frequency of Occurrence of Syllables 


7*3 

the 

00 

waz 

■57 

oi 

*39 

on 

.28 

os 

4.0 

av 

.84 

with' 

.56 

kan 

•38 

men 

.28 

wud 

3*3 

in 

.84 

di 

*54 

we 

•38 

6rz 

.28 

som 

3-3 

£nd 

•83 

ti 

*53 

ez 

■38 

our 

.28 

what 

3-2 

i 

.82 

an 

• 5 ^ 

bot 

•34 

en 

.28 

if 

3 -'^ 

0 

.78 

az 

.52 

hiz 

•34 

mi 

.28 

on 

3 *^ 

tu 

•71 

0 

.48 

tha 

•34 

thar 

.27 

kom 

2.4 

ing 

.70 

he 

.48 

no 

*33 

op 

.27 

yu 

2,1 

or 

.69 

a 

*47 

wil 

•33 

out 

.27 

da 

1.6 

ri 

.68 

61 

*47 

on 

*33 

bin 

.26 

nes 

1.4 

it 

.68 

en 

,46 

6a 

*33 

wor 

.26 

el 

1*3 

that 

.66 

hav 

.46 

dn 

! 

• 3 ^ 

thar 

.26 

si 

1*3 

iz 

.64 

bl 

*45 

which j 

•31 

ev ^ 

.26 

them 

1*3 

1 

.64 

ar 

*45 

so 

•31 

me 

.26 

dis 

1 .2 

li 

,62 

bi 

*43 

ffam 

•31 

tu 

1 

.26 

oth 

I.X 

for 

.61 

at 

•41 

hdd ' 

•30 

1 

ex 

*25 

hu 

*97 

be 

,60 

6r 

.41 

won 

.29 

its 


v6r 

.92 

shon 

.60 

nat 

.40 

ment 

.29 

kan 

*25 

when 

.91 

ed I 

.58 

tor 

*39' 

haz 

.29 

d6r 

.24 

du 

.86 

yu 

*57 

this 

•39 

bul 

.29 

him 

.24 

pe 



84 


SPEECH AND HEARING 


TABLE XIII 

Relative Frequency of Occurrence of Speech Sounds 


Speech 

Sound 


u 

6 

6 

a 

a 

e 

u 

0 
a 
e 

1 
1 

ou 

oi 

ew 

w 

y 

h 

1 

r 


Key 

Relative 

Frequency 

Speech 

Sound 

Key 

tool 

1.60 

m 


tone 

1.63 

n 


talk 

1.26 

ng 

hang 

top 

3-33 

V 


tape 

^•35 

2 


eat 

3-89 

. th 

then 

took 

0.69 

w. V zh 

azure 

ton 

5.02 

f 


tap 

4.17 

S 1 


ten 

3-44 

, th 

thin 

tip 

7-94 

sh 

shell 

dike 

1.59 

b 


our 

.0.59 

d 


oil 

0.09 

j 

. - ' g 


few 

0.31 



2.08 

. P 



0.60 

’^t 



1. 81 

- ..ch 

chalk 


3-74 

6.88 

11 




Relative 

Frequency 


2,78 

7.24 

0.96 

2.28 

2.97 

3-43 

0.05 

1.84 

4.55 

0.37 

0.82 
1. 81 

4.31 

0.44 

0.74 

2.04 

7-13 

0.52 

2.71 





Part Two 

Music and Noise 



If 






CHAPTER I 


Physical Properties of Musical Sounds 

Characteristics of Typical Musical Sound Waves 

Musical sounds, like speech sounds, are also transmitted 
through the air by very complicated wave forms. They are 
characterized by being sustained at definite pitches for com- 
paratively long times. When a change in pitch is made it 
takes place in definite steps called musical intervals, such as 
thirds, fifths, and octaves. For producing certain musical 
effects, occasionally the pitch is changed continuously from 
one position on the musical staff to another, but this is excep- 
tional and not the rule. 

There are two outstanding physical mechanisms for pro- 
ducing musical tones, namely, vibrating strings and vibrating 
air columns. The piano and the violin are examples of the 
first type of mechanism and the pipe organ, the flute and horn 
are examples of the second. Although the human voice is of 
sufficient importance to be considered by itself, it is really a 
mechanism of the second type. 

It is well known that a single note sounded by one of these 
musical instruments contains more than one frequency. The 
lowest component frequency, called the fundamental, usually 
determines the pitch but there is, in addition, a large number 
of component frequencies called harmonics, each one being a 
simple multiple of the fundamental frequency. It is this 
abundance of harmonics that produces the richness of musical 
tones. 


87 



88 


SPEECH AND HEARING 


The Electrical Harmonic Analyzer 

Unfortunately no good wave pictures of sounds from 
musical instruments seem to be available. Tones from some 
of the common musical instruments have been analyzed into 
their component frequencies by means of an electrical har- 
monic analyzer.! In using this instrument to analyze sound 
waves, a condenser transmitter is used to transfer the acoustic 
wave into an electrical wave which is a faithful copy of the 
original. This electrical wave is then sent into a selective 
network, the essential ■ feature of which is a sharply tuned 
circuit whose frequency of tuning is controlled by varying its 



Fig, 48. — Schematic Analyzer Circuit. 


capacity in small steps by means of a pneumatic apparatus 
similar to that used in a player piano. Maximum responses 
of the circuit occur at frequencies . of tuning which coincide 
with the frequencies of the components of the complex wave. 
A schematic of the analyzer circuit is shown in Fig. 48. 

An automatic photographic recorder registers as a per- 
manent record the amount of current getting through the 
tuned circuit at each frequency. From this record the relative 
amplitudes of the components of the complex wave are readily 
determined. For convenience of operation an automatic con- 

* Wegel, R. L., and Moofe, C. R., “An Electrical Frequency Analyzer,” published 
in The Bell System TeehnUnl Jqurnal, April, ija^. 




THE ELECTRICAL HARMONIC ANALYZER 89 

trol apparatus is provided so that it is only necessary to connect 
the complex source or sources to be analyzed and press the 
starting button. Then the completed record of the analyses 
IS delivered after the machine has passed through the entire 



Fig. 49.— Harmonic Analy.sis op Tone prom Trombone Organ Pipe. 


range of frequencies. In Fig. 49 is shown the record obtained 
by means of this machine of the tone from an organ pipe. 

In Fig. 50 are shown the essential mechanical features of 
the analyzer. The pneumatic arrangement is a modification 
^ player-piano mechanism in which a paper roll of standard 



Fig. 5o.-*-*Arrangem:ent of Pneumatic and Electrical Apparatus. 


dimensions is used. By proper perforation of the roll special 
pneumatic relays are operated in proper sequence to switch 
the condensers of the tuned circuit, flash frequency lines on 
the record, stop the mechanism after a record has been com- 




90 


SPEECH AND HEARING 


pleted, rewind the piano roll^ and perform other functions 
necessary to leave the analyzer in the starting position. 

The photographic recording apparatus consists of the 
camera motor for moving the sensitized recordpaper at a con- 
stant rate, a proper arrangement of lenses and lamps for 
illuminating the mirror galvanometer and tracing the scale 
and frequency lines, and suitable baths for developing and 
fixing the record. The record is drawn through the mechan- 
ism by means of the two motor-driven rubber rollers which 
serve also to remove excess solution. A more detailed descrip- 
tion of this apparatus will be found in the paper by Wegel and 
Moore, the inventors of this analyzer. 


Acoustic Spectra of Typical Musical Instruments 

The next four figures, 51, 52, 53, and 54, show the results 
obtained by means of this apparatus for the analysis of some 
musical tones. The acoustic spectrum shown in each case 
was obtained directly from an experimental chart similar to 
that shown in Fig. 49. The wave picture was then constructed 
from the component frequencies, assuming that all of the 
components had the same phase. Due to the time required 
to make a complete analysis of sung vowels and piano tones, 
the sound is necessarily interrupted but this does not affect 
the final result. It is seen that the low-pitched piano tone 
has a large number of harmonics. For all of the strings in the 
lower pitch register on the piano, the tone is largely carried 
by the harmonics rather than the fundamentals. It is inter- 
esting to note that the component frequencies between 2500 
and 3000 in the case of the clarinet are very much magnified* 
It is seen that the tenth harmonic has about one-half the 
amplitude of the fundamental. Also, for the 'cello organ pipe 
the third harmonic has about five times the amplitude of the 
fundamental. The trombone organ pipe is very rich in har- 
monics. Experiments made with Bourdon organ pipes show 
that the amplitudes of the harmonic frequencies are quite small 
compared to the fundamental, 



FREQUENCY 


Figure 51, 






AMPLITUDE 


Si 


SPEECH AND HEARING 









ACOUSTIC SPECTRA OF 




CLARINET c‘ 


INSTRUMENTS 







ARRANGEMENT OF FREQUENCY 


Ranges of Frequency and Intensity for Music 

A similar analysis was made for typical pipes taken from 
the pipe organ. The average results obtained from this 
analysis are shown in Fig. 55. The ordinates give the averase 
pressure variation in bars which is produced in an ordinarv 
room when the pipes having a fundamental frequency equal 
to the abscissas are blown with a pressure equivalent to that 



Figure 55. 


ordinarily used. These absolute values will depend very 
largely upon the type of the room and also upon its size. It 
is seen that the maximum pressure variations in the air are 
produced by those organ pipes having a pitch of 64 cycles. 
On account of the large content of energy carried by the lower 
frequencies in organ music^ it is difficult to build a transmission 
system which will faithfully reproduce this kind of music. 

Experiments with the pipe organ in the Warner Brothers’ 
Theatre of New York City indicated that the sensation levels 




96 


SPEECH AND HEARING 


produced in the main body of the hall were from 40 to 50 db, 
which are somewhat lower than those shown in Fig. 55 * 

To obtain a rough estimate of the range of frequencies and 
intensities produced by musical instruments the following 
experiments were performed by C. E. Lane of Bell Telephone 
Laboratories. To determine the sound pressures created, an 
apparatus was used which consisted of a calibrated condenser 
transmitter, a vacuum tube amplifier and rectifier, and an 
ammeter. From the reading of the meter an average r.m.s. 
pressure on the transmitter diaphragm could be determined. 
Measurements were made with the lips of the person singing 
or with the instrument which he was playing about 1 8 inches 
away from the transmitter. In each case three intensity levels 
corresponding to the musical notations, pp, mf, and were 
determined. 

For singing, a part of Wagner’s Pilgrims’ Chorus was lased 
and the intensities of certain notes were measued. In this 
way 'three bass voices, two tenor voices, three soprano voices, 
and three alto voices were measured. These persons did not 
have unusually strong voices but represented about the average 
found in glee clubs and choirs. In Table XIV the average 
results are given. The figures give the pressure variation in 
bars created at a distance of 18 inches from the singer. The 
powers created by the voice corresponding to these figures 
range from 1000 to 30,000 microwatts, which are considerably 
higher than those used in conversational speech. 

The singing intensity was found to remain approximately 
constant from the middle range of pitches to the higher range. 
For the low range the intensity fell off rapidly so that the ^ 
lowest note that could be sung well produced a pressure 
variation of only about one-thirtieth that of the higher range.<5. 

In a similar way results were obtained for the various musical 
instruments. 

Table XV gives the values for the wind instruments for the 
case when these instruments are pointed at an angle of 60° 
from the pick-up transmitter. It is seen that the stringed 
instruments produce, in general, less intensity than the wind 



ARRANGEMENT OF FREQUENCY 


97 


instruments, the violin producing the weakest sounds. The 
bass drum produces the greatest intensity. The sound from 
this instrument is transmitted through the air by means of 
very low frequencies. As a rule the bass instruments produce 
the greatest intensities, the tenor and alto the next greatest, 
and the soprano the least intensities that are used in music. 


TABLE XIV 




Pressure (Bars) 

Pitch Range 
(Octaves above 

I Kilocycle) 

PP 

mf 

/ 

Saxophone (C Melody) 

20 

26 

3 ^ 


Trombone 

12 

21 

35 

— 3.5 to — 1. 0 

Cornet 

^7 


34 

— 2.6 to ““ O.I 

Clarinet 

15 

27 

33 

— 2.6 to + .6 

Fife 

12 

24 

35 

~ 1.5 to + 1.8 

Baritone Horn 

17 

26 

3^ 

- 3.6 to — 1.4 

Bass Tuba 

21 

34 

41 

— 4.4 to — 1.4 

Organ Pipe . 


30 



Violin 

8 

18 

25 

— 2.5 to + 2.0 

Banjo 

24 

32 

3 ^ 


Mandolin 

13 

18 

24 

— 2.5 to 4* 0.4 

Bass Viol 

13 

21 

31 

- 4.6 to - 3.5 

Harp (single notes) 

II 

20 


— 5.0 to + 1.6 

Harp (chord in G, including thirteen notes) 


29 

3 ^ 


Piano (single notes) 


23 

31 

— 5.1 to 4 1.6 

Bass Drum 

29 

40 

48 


Snare Drum 

10 

22 

33 





98 


SPEECH AND HEARING 


The small amount of data available Indicates that musical 
instruments are built so that the tones of different pitch will 
have approximately the same loudness as interpreted by the 
ear. Since the ear is relatively insensitive to the low pitches, 
organ pipe tones for this low-pitch range are considerably 
more intense than those produced in the high-pitch range. 
The same is true of the tones from a piano. The mean power 
of the sound during the rendition of an orchestral selection 
varies over wide ranges, sometimes as much as 100,000 to i. 
This fact makes It very difficult to handle the proper trans- 
mission of such music. 

The important frequency range for music is from 50 to 
5000 cycles, and for very faithful reproduction frequencies an 
octave below and an octave higher than these limits must be 
transmitted. 



CHAPTER II 


Noise 


Physical Properties of Noise 

Those sounds to which no definite pitch can be assigned 
are usually classified as “noise.” The clapping of hands, the 
rattling of paper, the hammering of typewriters, and the roar 
from the traffic in the street, are typical types of noise. Prac- 
tically all types of sound which cannot be classified as speech 
or musical tones come under this classification. Although 
the noise waves are carried through the air by vibrations similar 
to those transmitting both speech and music, their form is very 
much more complex. The range both in intensity and fre- 
quency is very much greater than for the first two classes of 
sounds discussed. For this reason it is very much more 
difficult to transmit them faithfullv by means of any trans- 
mission system. 

In Fig. 56 is shown a typical wave form of street noise. 
As will be seen, its principal characteristic is the great irregu- 
larity in the vibration. The wave form at the bottom was 
produced by a pure tone having a frequency of 500 cycles per 
second and is given for comparison. 

When transmitting speech or music either directly to an 
audience in a large hall or over an electrical system, such as a 
radio or a telephone system, there is always an interference 
to the proper reception of such speech and music, due to 
other sounds being present. These extraneous sounds which 
serve only to interfere with the proper reception are designated 
by engineers as “noise.” With such a designation, the sound 
may be either periodic or non-periodic as long as it is something 
that would be better eliminated. In telephony noises result 


99 


100 


SPEECH AND HEARING 


from a number of different sources. Some of these noises 
arise from inductive effects between telephone lines and other 
types of electrical transmission lines; other noises are caused 
by electrical disturbances originating within the telephone 
system itself. In addition to these, there is always, of course, 




\mimmjmmMiwmimimmjm 

Fig. 56. — ^Typical Wave Form op a Street Noise and a Pure Tone of 500 Cycles. 


a certain amount of noise, generally classed as “room noise,” 
in places where telephones are used. 

A sample spectrum of line noise current is shown in Fig. 57. 
This was obtained by analyzing the current flowing in an open 
wire toll line which was terminated by a resistance of 700 
ohms. It is seen that the components, with one exception, 
are all harmonics of a 60-cycle fundamental. Also it will be 
noticed that the odd harmonics are much stronger than the 
even ones, which is a notable characteristic of a power gen- 
erator. It is evident then that the principal part of this line 
noise current is due to the inductive effect of a power line 
carrying a 60-cycle current upon this particular telephone line. 
When a subscriber’s circuit is connected to such a toll line this 
line noise current produces a sound in the telephone receiver 
called “line noise.” It sounds like a hum having a definite 
pitch and for that reason it might be classed as a musical tone. 
However, due to the interference which it causes to the proper 



PHYSICAL PROPERTIES OP NOISE 


loi 


recognition of the transmitted speech sounds, it is classed as 
noise. 

Other types of electrical disturbances which may be picked 
up or which may originate in the telephone system itself 
produce sounds at the receiver which are more nearly true 
noise sounds as they have no periodic characteristics. 

The room or booth on the receiving end of the line always 
has some noise present. It varies in character from the 
ticking of a clock in a quiet country home to the intense noise 
in a booth on the platform of a subway railway station. The 
type of room noise which is the most characteristic is that 
which is usually characterized as "roar” from the street. For 
some purposes it is desirable to know the average frequency 
content of the room noise. Its character is so varied, however, 



Fig. 57. — ^Line Noise Spectrum. 

that for most purposes it is usually assumed that the energy 
of the room noise is scattered uniformly throughout the 
important range of speech frequencies. 

The interference to the person carrying on a telephone 
conversation is due mainly to the room noise getting to the 
telephone ear, that is, to the ear on which the telephone 
receiver is placed. As will be shown later, only a small amount 
of interference to the proper reception of speech is caused by 
any noise in the non-telephone eatk Even holding the telephone 



102 


SPEECH AND HEARING 


receiver as tightly as possible to the ear does not entirely elimi- 
nate the noise. 

In radio transmission “room noise” affects the received 
sounds in the same way that it does in telephone transmission, 
but the line noise consists of static, squeals, and howls from 
spark and regenerative sets, which are improperly operated, as 
well as from sources in the radio s^’^stem itself. In this case 
a,lso, it is usually assumed that such noises are scattered uni- 
formly throughout the audible frequency range, except for 
such frequency selectivity as is imposed by the characteristics 
of the transmission system. 

When there is no obvious way of reducing the noise level, 
it is necessary to raise the level of the speech or music being 
transmitted. For example, the best remedy which has yet 
been proposed for static in the receiving set is the construction 
of powerful broadcasting stations. When a receiving set is 
within a short distance from such stations little trouble is 
experienced from static noises. Similarly, in the telephone 
plant it has been necessary to supply to the average sub- 
scriber more than one thousand times as much power as 
would be necessary for good hearing if the line and room 
noises were eliminated. 

Method of Measuring Noise 

A method of gauging line noise is to measure the electrical 
currents induced in the telephone transmission system when 
it is in operating condition but not transmitting speech. If 
a meter were designed to give the proper importance to the 
various frequency components in the noise currents, its 
reading would be an indication of the detrimental effect of the 
noise currents on the line. 

For measuring room noise a high quality telephone system 
similar to that for recording speech sounds described in Part 
One, Chapter II, could be used. Such systems have been tried 
but they have been found to be impractical for most purposes 
because the apparatus is too bulky and requires expert atten- 
tion to keep it in adjustment. Instruments have proved to be 


METHOD OF MEASURING NOISE 


103 


serviceable which measure the deafening effect of the noise 
upon the ear. For this reason any instrument which has been 
designed for measuring the acuity of hearing can be used with 
slight modifications, for measuring room noise. 

One such instrument which has proved very serviceable as 
a rough indication in this connection is known as a buzzer type 
audiometer. A picture of this instrument is shown in Fig. 58. 
In this instrument a buzzer element generates an electrical 



Figure 58. 


current having component frequencies scattered throughout 
the entire speech range. This generator is connected through 
a system of networks, called an attenuator, to a telephone 
receiver which has a special cap designed to hold the receiver 
at a fixed distance from the ear. To make a measurement of 
the room noise, the intensity of the sound issuing from the 
receiver is reduced by turning the dial of the attenuator until 
the receiver sound is masked by the noise present m the room. 
In other words, the threshold of audibility of the tone from 
the audiometer determines the amount of noise m the room. 
The difference between the threshold setting obtained in a 



104 


SPEECH AND HEARING 



noisy place and that obtained in a quiet place by an individual 
with normal hearing gives the deafening effect of the room 

noise for the buzzer tone. _ i j r • 

A similar principle can be used to determine the deafening 
effect of noise for each frequency. Such an instrument, which 
was designed primarily for measuring the degree of deafness, 
is shown in Fig. 59 and is known as the Western Electric a-A 
Audiometer. It is designed to produce any one of eight pure 
tones ranging in frequency from 64 to 8192 cycles per second 


Figure 59. 

at definite sensation levels which are obtained by properly 
setting the dial. A curve showing the loss of hearing at each 
frequency is called an audiogram. For this reason it is con- 
venient to refer to a curve showing the deafening effect at 
each frequency due to noise as a noise audiogram. Such a 
curve if obtained for all frequencies would be more directly 
correlated to the effect upon the recognition of sounds than is 
an acoustic spectrum of the noise. 



METHOD OF MEASURING NOISE 


105 


When single frequency components are present in the 
noise being measured and single frequency tones are used for 
measuring the noise audiograms, beats are produced which 
make the proper location of the threshold rather difEcult. 
For this reason, a type of noise meter has been developed 
which produces bands of frequencies instead of single fre- 
quencies. By its use, more accurate noise audiograms can be 
obtained. These bands of frequencies are produced by 
means of another type of portable audiometer known as the 
phonograph audiometer. This instrument consists of a phono- 
graph turntable, special records, and an electrical reproducer 
connected to a telephone receiver. An oscillating circuit in 
which is placed a variable condenser was designed so that 
bands of frequencies of any width and with the components 
spaced at any desired frequency interval could be produced. 
These bands of frequencies were then recorded on phonograph 
records by means of the new electrical process. From these 
they are reproduced by means of the electromagnetic repro- 
ducer and sent to the receiver which has the special receiver 
cap described above. The method of determining the masking 
effect of the noise is the same as that given for the other two 
audiometers. In practice it is difficult to measure the mask- 
ing effects of varying noises. 

Instead of using the deafening effect as a measure of 
noise, another method is to produce an artificial noise which 
is judged by the observer to have the same interfering effect 
as that existing. The measurement of noise by this means 
is called the “balance method.” It may be used either for 
measuring line noise issuing from the telephone receiver at 
the end of the line or for measuring room noise. The 
difficulty with such a method is due to the inability of a person 
to judge accurately when two sounds which differ greatly 
in character are equally loud. The phonograph audiometer 
is very suitable for making such measurements if records are 
available which have a character of noise similar to the type 
which it is desired to measure, 



io6 


SPEECH AND HEARING 


Results of Noise Surveys 


In all of these instruments the unit showing the degree 
of the deafening effect is the db or the sensation unit. For 
example^ Dr. E. E. Free ^ found that the noisiest place in New 
York City was at Thirty-fourth Street and Sixth Avenue^ 
where the buzzer audiometer registered 50 db. This means 
that sounds originally near the threshold of hearing which one 
desires to hear at this place must be magnified 1 00^000 times 



64 126 256 512 1024 2046 4096 6192 

FREQUENCY ' 



FREQUENCY 



FREQUENCY 



Figure 6o. 


before they can be heard. In many places the noise becomes 
much greater than this. 

Measurements in a number of offices and places of business 
in New York City have indicated that the deafening effect 
usually encountered is about 30 db. For this reason, a man 
who is permanently deafened by this am.ount will scarcely 
notice his defect except when he goes into a quiet place such 

^ Free, E. E., “Noises Ypii Never Hear/’ Pop. Set. Monthly^ v, 10^, pp. 16-17, 
August, 1926. 




RESULTS OF NOISE SURVEYS 


107 

as a church or a theater. Such a deafened person would also 
have little difficulty in using the telephone for the system is 
designed to reproduce speech sufficiently loud to override 
the deafening effect of the noise. 

To illustrate the type of noise audiograms which one would 
obtain for certain kinds of noise, four such audiograms are 
shown in Fig. 60. As indicated, the first is for typical street 
noise, the second for noise coming from the typewriter which 
is being rapidly operated, the third for a low-pitched whistle, 
and the fourth for a high-pitched whistle. 








Part Three 

Hearing 




CHAPTER I 


Mechanism of Hearing 

How we hear has been a subject for discussion by men in 
the various branches of science for a long time. Although 
there is good agreement concerning the principal structures 
of the ear, there is still considerable controversy regarding 
the function of the various parts. Of the five senses it is hear- 
ing that makes us aware of the presence of physical disturbances 
called sound waves. For audition purposes sound may be 
classified into two groups, namely, pure tones and complex 
sounds. A pure tone is specified by two properties, namely, 
the pitch and the loudness. These sensory properties are 
directly related to the physical properties, frequency and inten- 
sity of vibration of the air particles near the ear. Some 
psychologists state that there is a third sensory property of 
a pure tone, namely, volume or extension of the tone. It is 
related to both intensity and pitch although the qualitative 
relationship has not been definitely established. Complex 
sounds may be considered as combinations and variations of 
pure tones. Vibrations in the sound wave communicate 
mechanical vibrations to the ear drum which, in turn, com- 
municates the vibration to the inner ear where the nerve 
endings are excited. 

Description of the Organs of Hearing 

The ear mechanism may be divided into three general parts: 
the outer ear, the middle ear, and the inner ear. The outer 
ear consists of the external part or pinna, and the ear canal 
or auditory meatus. The middle ear contains three small 



112 


SPEECH AND HEARING 


bones or ossicles called^ respectively, the hammer, the anvi*, 
and the stirrup. The inner ear contains the cochlea, vestibule, 
the semi-circular canals, and the endolymphatic duct and sac. 
In the cochlea are located nerves which give us the sense of 



Fig. 6i. — Semi-diagrammatic Section through the Right Ear (Czermak); 

EXTERNAL AUDITORY MEATUS; T, MEMBRANA TYMPANI; Py TYMPANIC CAVITY; 

Oy FENESTRA OVALIS; r, FENESTRA ROTUNDA; By SEMI-CIRCULAR CANAL; *V, COCHLEA; 

Fiy SCALA VESTIBULI; Pty SCALA TYMPANI; E, EuSTACHIAN TUBE; Ry PINNA. 

hearing, and in the semi-circular canals are located nerves 
which cause reactions concerned with the maintenance of 
equilibrium. 

Figure 6 1 shows a schematic diagram of the parts of the 
ear with the inner ear much enlarged. The pinna is used by a 
number of animals to aid in collecting the sound. The human 
pinna has almost lost this function but a cupped hand held to 
the ear sometimes supplants it. 

The ear canal, or auditory meatus, G, is about three centi- 



DESCRIPTION OF THE ORGANS OF HEARING 113 

meters long. It is closed at the inner end by the ear drum 
or tympanic membrane. Attached to the drum from its 
center and upwards by a long part called the handle is the 
first of the ossicles, called the hammer. The top of the ham- 
mer is connected with the anvil by a joint and the anvil in 
turn is connected to the stirrup, the small bone that conveys 
the moti^ through the oval window to the labyrinth in the 
inher ear*Tlie part of the stirrup lying in the oval window 
is flat and is called the foot plate. It is held in place by an 
annular ligament of the membrane which prevents the fluid 
of the inner ear from coming into the middle ear. The mastoid 
cells are connected to the middle ear but are not concerned 
with hearing. 

The inner ear has a dense bony wall forming an irregular 
cavity referred to as the bony labyrinth and is filled with 
fluid. It contains a smaller structure of the same general 
shape called the membranous labyrinth which contains a 
fluid that is separate and distinct from the rest of the fluid 
in the bony structure. Its walls are formed by a very soft 
membrane so that sound waves pass through them with little 
obstruction. The cavity of the inner ear is encased in solid 
bone and has only two small openings into the middle ear, one, 
at the oval window into which fits the stirrup, and one at the 
round window indicated at r. An elastic membrane is stretched 
across the round window and is sometimes referred to as the 
secondary car drum. The middle ear is connected to the 
outside air by means of a small tube called the Eustachian tube, 
which opens into the upper part of the throat behind the nasal 
cavity. Infectious germs sometimes travel up this tube from 
the nasal cavity and cause a “gathering” in the middle ear. 

The inner ear consists of three principal parts, namely: 
(i) the semi-circular canals which take no part in the mechan- 
ism of hearing, but serve as an organ of balance, (a) the 
vestibule, the space just behind the oval window, and (3) the 
cochlea which is really the end organ of hearing. Cross- 
sections of the cochlea as it twists into a relatively long spiral 
of two and three-quarter turns like a snail shell are indicated 



SPEECH AND HEARING 


1 14 

at -S' in Fig. 61. The center of the spiral is a bone called the 
modiolus, and is perforated to allow space for the auditory 
nerve. The nerve enters the base of the cochlea and outside 
it unites with the nerves from the semi-circular canals into 
two parts forming the eighth cranial nerve. The cochlea 



Fig. 6a. — Cochlea in Tjiansver.se Section. Observe e.si'eciai.ly the canal nr 

THE COCHLEA WHICH I.S A PART OF THE MEMBRANOUS LABVRINTH. (TcStllt.) 


is divided along its length into three parts by the basilar 
membrane and Reissner’s membrane. These form three 
parallel canals which are wound into the spiral. A cross- 
section showing the shape of these canals is given in log. 62. 
The oval window is at one end of the scala vestibuli and the 
round window at the end of the scala tympani, , 

As indicated in this figure, the canals are called the scala 
media or canal of cochlea, scala tympani, and scala vcsfilnili. 
As stated before, the membrane of Reissner is a very thin 
flexible membrane which will very readily pa.ss any sounti 
waves, so that from a dynamical consideration, the canal tif 
cochlea and scala vestibuli may be considered as a .single 
chamber filled with fluid. The partition between the .scala 
tympani and the other two chambers is compo.sed of a bony 
projection called the lamina spiralis for about half the distance, 
the remainder being a flexible membrane called the basilar 



DESCRIPTION OF THE ORGANS OF HEARING 


115 

membrane. It is seen from this figure that if any vibratory 
energy is communicated from one side of this partition to the 
other, it must vibrate the basilar membrane. On one side of 
the basilar membrane is the organ of Corti, which contains 
the. nerve terminals in the form of small hairs extending into 
the canal of cochlea. Attached to the lamina spiralis and 
lying over the hair cells is another soft loose membrane called 
the tectorial membrane. The details of this part of the 
inner ear are made clearer by Fig. 63, which is a greatly magni- 
fied cross-section of these two membranes. It is seen from 
this figure that there are five rows of hair cells at the terminals 
of the so-called rods. There are about 5000 rods in each of 
the four outer rows and about 3500 in the inner row, making 
a total of about 23,500 rods. At the end of each rod there is 
a hair cell from which project twelve to fifteen hair cilia 

OUTER HAIR CELLS 



Fig. r»;r“^CoR'ri’f> Organ (after Retzius). The tectorial membrane is shown 

CONTRACTED IN THE PROCESS OF HARDENING THE TISSUE, AND TORN AWAY FROM 
THE PLA'I'KAU OF CoRTI. 


into the liquid of the cochlea. When a sound excites the sense 
of hearing there is a relative motion between the basilar 
membrane and the tectorial membrane which causes the hair 
cells to stimulate the nerve endings at their base. The base 
of the inner rod of Corti is supported on the edge of the bony 
projection, called the lamina spiralis. For this reason, accord- 



n6 


SPEECH AND HEARING 


ing to some authors ^ the motion which stimulates the hair 
cells is a lateral one between the rods and the tectorial mem- 
brane due to the rocking motion of the former. According to 
Dr. Shambaugh, the stimulation is principally due to the 
vibration of the tectorial membrane. Helmholtz, as well a.s 
many other writers on the subject, assumed that the basilar 
membrane was the principal vibrator carrying the rods of 
Corti with it. Thus the hair cells are excited by their relative 
motion to the tectorial membrane. Any of these points of 
view are still possible even if we assume that the ends of the 
hair cells are imbedded in the tectorial membrane. In any 
case the nerve endings are stimulated when a sound vibration 
is conducted from the canal of cochlea to the scala tympani. 

A nervous impulse is then conducted by means of nerve 
fibres through the base of the rods to the cochlear nerve and 
then to the brain, causing the sensation of hearing. 
chambers separated by the basilar membrane are connected 
by a small opening at the apex of the cochlea called the 
helicotrema. 


The drum of the ear and the ossicles of the middle ear act 
as a sort of transformer to communicate the vibratory energy 
from the air, a light medium, into the liquid, a den.se medium. 
Due to the fact that the area of the stirrup which plunges 
into the fluid of the inner ear is about one-twentieth of that 
01 the ear drum and also due to the lever action of the three 
bones, the pressure exerted by the oval window of the middle 
ear upon the fluid of the inner ear is from thirty to sixty times 
that exerted by the air upon the car drum. This transformer 


from that offered by thcHme m- Tn ! l" r ^ wry different 

« int same area in a large Iwdy of water. 



DESCRIPTION OF THE ORGANS OF HEARING 117 


The relative sizes of the parts of the inner ear may be 
judged from Fig. 64/ which shows the cochlea uncoiled. It is 
seen that the length of the uncoiled cochlea is about 31 milli- 
meters. The cross-sections of the cochlear passages on each 
side of the basilar membrane vary as one goes from the oval 
window to the helicotrema as indicated in the figure. The 
area of the stapes where it fits into the oval window is seen 
to be about 3 square millimeters. The opening between the two 
chambers at the helicotrema is about one-quarter square milli- 



“ ‘ “ V ' '"“V ' V 

laT COMPLETE TURN EhpTURN APICAL TURN 


Fig. 64. — A . The Dimen.sions and Shape of the Human Basilar Membrane and 
Bonv Spiral Lamina. B . Diagram of the Sectional Areas of the Cochlear 
Passages (drawn to scale). In each case the actual measurrment.s and 
scale of measurements are given. 

less than 2 square millimeters. These figures emphasize the 
fact that this important mechanism of hearing is really very 
small. As mentioned above, the nerve terminals are scattered 
along the basilar membrane, and all the differentiations of 
complex sounds which are heard are made possible by the 
corresponding stimulation patterns produced in this membrane 
only one-quarter millimeter wide and about 31 millimeters long. 

’The dimensions for this figure were obtained from the book by Wrlghtson and 
Keith entitled “An Inquiry into the Analytical Mechanism of the Internal Kar.” 



SPEECH AND HEARING 


ii8 

Functions of the Various Parts of the Ear While Sensmga Sound 

There have been many theories proposed which describe 
the various functions performed by the different parts of the 
ear when sensing a sound. Most of these theories originated 
before there was much quantitative data concerning the facts 
of audition. During the last few years^ due to the accumula- 
tion of such data, the evidence has been overwhelmingly in 
favor of a theory which is an extension of that originally pro- 
posed by Helmholtz. Only this theory will be described here. 
Those interested in some of the other theories will find good 
discussions of them in various recent publications^ 

The Helmholtz theory is frequently called the resonance 
theory or the Harp theory. Either of these names gives rise 
to a wrong conception as to what Helmholtz really intended. 
In order to obtain a true picture of the Helmholtz theory I am 
giving below some abstracts taken from his book entitled 
“Sensations of Tone.^' These are sufficient to give the 
essential elements of his theory. 

“When the drumskin is driven inwards by increased pressure (if 
air in the auditory passage, it also forces the auditory ossicles inwards, 
as already explained, and as a consequence the foot of the stirrup 
penetrates deeper into the oval window. The fluid of the labyrinth, 
being surrounded in all other places by firm bony walls, has only 
one means of escape, — the rouncl window with its yielding membrane. 
To reach it, the fluid of the labyrinth must either pass through the 
helicotrema, the narrow opening at the vertex of the cochlea, flowing 
over from the vestibule gallery- into the drum gallery, or, as it wt^uld 
probably not have sufficient time to do this in the case of sonorous 
vibrations, press the membranous partition of the cochlea against 

^ Boring, E. G., “Auditory Theory with Special Reference to Intensity, Volume, 
and Localization,” American Journal oj Psychology, April, 1926, VoL XXXVIL 

Fletcher, H., “Physical Measurements of Audition and Their Bearing on the 
Theory of Hearing,” Journal oj the Franklin Instittite, VoL 196, No. 3, September, 1923, 
Knudsen, V. O., and Jones, L H., “Facts and Theories of Audition, Annals of 
Otology, Rhinology and Laryngology,” December, 1925, and March, 1926. 
Wilkinson, George, and Gray, Albert A., “The Mechanism of the Cochlea.” 
Wilkinson, George, “Is the Question of Analysis of Sound by Resonance in the 
Cochlea by Central Analysis Still an Open One.?,” American Journal of Psychology, 
April, 1927, VoL XXXVIII. 


FUNCTIONS OF THE VARIOUS PARTS OF THE EAR 119 


the drum gallery. The converse action must take place when the 
air in the auditory passage is rarefied. 

“Hence the sonorous vibrations of the air in the outer auditory 
passage are finally transferred to the membranes of the labyrinth, 
more especially those of the cochlea, and to the expansion of the 
nerves upon them.” 


“Hence when we hereafter speak of individual parts of the ear 
vibrating sympathetically with a determinate tone, we mean that 
they are set into strongest motion by that tone, but are also set 


d 



into vibration less strongly by tones of nearly the same pitch, and 
that this sympathetic vibration is still sensible for the interval of a 
Semitone. Figure 65 may serve to give a general conception of the 
law by which the intensity of the sympathetic vibration decreases, 
as the difference of pitch increases. The horizontal line a b c repre- 
sents a portion of the musical scale, each of the lengths a h and h c 
standing for a whole (equally tempered) Tone. Suppose that the 
body which vibrates sympathetically has been tuned to the tone b 
and that the vertical line b d represents the maximum of intensity 
of tone which it can attain when excited by a tone in perfect unison 
with it. On the base line, intervals of i/io of a whole Tone are 
set off, and the vertical lines drawn through them show the corre- 
sponding intensity of the tone in the body which vibrates sympa- 
thetically, when the exciting tone differs from a unison by the corre- 
sponding interval.” 


“Under these circumstances the parts of the membrane in unison 



120 


SPEECH AND HEARING 


with higher tones must be looked for near the round window, and 
those with the deeper, near the vertex of the cochlea, as Hensen also 
concluded from his measurements. That such short strings should 
be capable of corresponding with such deep tones, must be explained 
by their being loaded in the basilar membrane with all kinds of solid 
formations; the fluid of both galleries in the cochlea must also be 
considered as weighting the membrane, because it cannot move with- 
out a kind of wave motion in that fluid.” 


“The' 4200 Corti’s arches appear then, in this respect, to be 
enough to apprehend distinctions of this amount of delicacy. But 
even if it should be found that many more than 4200 degrees of 
pitch could be distinguished in the Octave, it would not prejudice 
our assumption. For if a simple tone is struck having a pitch 
between those of two adjacent Corti’s arches, it would set them 
both in sympathetic vibration, and that arch would vibrate the more 
strongly which was nearest in pitch to the proper tone. The small- 
ness of the interval between the pitches of two fibres still distinguish- 
able, will therefore finally depend upon the delicacy with which the 
different forces of the vibrations excited can be compared. And 
we have thus also an explanation of the fact that as the pitch of an 
external tone rises continuously, our sensations also alter continu- 
ously and not by jumps, as must be the ca.se if only one of Corti’s 
arches were set in sympathetic motion at once.” 


“The sensation of different pitch would consequently be a sen.sa- 
tion in different nerve fibres. The sensation of a quality of tone would 
depend upon the power of a given compound tone to set in vibration 
not only those of Corti’s arches which correspond to its prime tone, 
but also a series of other arches, and hence to excite sensation in 
several different groups of nerve fibres. 

“ Physiologically it should be observed that the present assumption 
reduces sensations which differ qualitatively according to pitch and 
quality of tone, to a difference in the nerve fibres which are excited." 


It is seen that according to this view when a sound wave 
impinges upon the ear drum its vibrational motion is commu- 
nicated through the middle ear, and, as stated above, its ampli- 
tude is decreased to about one-sixtieth and the force or pressure 
variation increased correspondingly as it enters the inner ear 
or cochlea. Here the vibration is communicated to the fluid 



FUNCTIONS OF THE VARIOUS PARTS OF THE EAR lai 


of the scala vestibuli. If the pitch of the tone is low, say, 
below 20 cycles per second, this fluid in the scala vestibuli 
and the scala tympani is moved bodily back and forth through 
the helicotrema. The motion between the round window and 
the oval window is just opposite in phase, the former moving 
inward while the latter moves outward. Thus, at the very 
low frequencies the mass reaction of the fluid is not sufficient 
to cause any appreciable transverse motion of the basilar 
membrane and consequently limits the lower pitch range of 
audibility. 

At the very high frequencies the mass of the ossicles is so 
great that very little energy can be transmitted to the cochlea. 
When the elastic forces are negligible, it requires a force ten 
thousand times larger to produce a given amplitude at 10,000 
cycles than at 100 cycles. For this reason it is probable that 
the factor which controls the upper limit of pitch audibility 
is the mass reactions involved in the ear rather than any lack 
of nerve sensitivity. For intermediate frequencies the mass 
reactions, the elastic restoring forces, and the frictional resist- 
ances which are brought into play are such that the vibratory 
energy is transmitted through the basilar membrane ^ at certain 
points causing the nerves to be excited. 

From experiments on the differential sensitivity of the ear 
for pitch, which will be discussed in Chapter IV, it has been 
calculated that tones of various pitch are sensed by nerves at 
various positions along the basilar membrane as indicated in 
Fig. 66 . If a pure tone of vibrational frequency of 1000 is 
communicated to the ear, the energy passes through the 
ossicles of the middle ear and sets the fluid in the scala vestibuli 
into vibration. The vibration is communicated through the 
canal until it gets half way up the cochlea, where, due to 
resonance, the vibrational energy is conducted through the 

^ As described above, the tectorial membrane lies over the basilar membrane so 
that if either one or the other or both vibrate when a wave is transmitted from th<^ 
scala vestibuli to the scala tympani, the hair cells will be stimulated. In the abovt^ 
discussion, these two membranes are considered as one and referred to as the basilai 
membrane. 



122 


SPEECH AND HEARING 


basilar membrane and then down through the scala tympani 
to the round window. The nerves which are stimulated most 
are those which are near the mid-point of the basilar mem- 
brane. The pitch or position of a tone on the musical scale is 
then dependent upon the position of the maximum stimulation 
along th^ basilar membrane. The character of the sound 
depends upon the positions and the relative intensities of 
agitation of the various parts of the basilar membrane. 

Although the mechanics of the cochlea as sketched above 
is undoubtedly correct in its essential feature, it does not 
follow that the brain depends entirely upon the space pattern 
of stimulated nerves for determining the pitch and quality of 



Fig. 66 . — Characteristic FREdUENCY Regions on the Basilar Membrane. 


tones. Some of the features of the original sound wave may be 
preserved in the combination of the nerve impulses being sent 
to the brain by the individual nerve fibres. The time pattern 
of stimulation which is impressed upon the brain may therefore 
aid in making the proper Interpretations. This is certainly 
true for long-time intervals. Some authors state that the 
first of these, namely, the space pattern, is sufficient to account 
for the recognition of pitch and quality while others insi.st 
that the second of these, namely, the time pattern, is the thing 
which is used to recognize pitch and quality. No doubt they 
both aid in varying amounts depending upon the character 
of the sound and the condition under which one listens to it. 
There are some direct experimental evidences which 



FUNCTIONS OF THE VARIOUS PARTS OF THE EAR 123 


indicate in a general way that the positions for sensing the 
tones of various pitches given in Fig. 66 are correct. The 
best known are those of Yoshii, Wittmaach, and Marx 1 on 
guinea pigs. These guinea pigs were kept in a continuous 
sound of fixed pitch for several hours a day for a long time. 
They were then killed and a histological examination made. 
All these observers noted that the low tones caused a degenera- 
tion of the end organ near the apical end, while the high tones 
caused a degeneration near the basal end, that is, near the 
oval window. They also observed that when tones in the 
middle register were used, a degeneration occurred near the 
middle of the basilar membrane. 

There are three anatomical facts which show how the 
cochlea can act in a mechanical way to separate the tones of 
different pitch in the manner described: first, the basilar 
membrane increases in width more or less regularly and con- 
tinuously from the base to the apex of the cochlea; second, the 
transverse fibres constituting the basilar membrane decrease 
in tension from the basal to the apical end of the cochlea; and 
third, the vibrating mass of fluid becomes greater as the 
stimulated spot goes from the basal to the apical end. 

As a first approximation, we can treat the mechanical 
system as one having a single degree of freedom, that is, having 
a single moving mass constrained by elasticity and resistance. 
The vibrating mass is principally the fluid in the scala vestibuli 
from the oval window to the stimulated spot and back through 
the scale tympani to the round window. From Fig. 64 it was 
estimated that this length varied from 2 millimeters to 60 
millimeters or a ratio of 30 from one end of the basilar mem- 
brane to the other. The elasticity is furnished mainly by the 
transverse fibres of the membrane which have a variation in 
length of I to 4. The variation in tension is unknown although 
the structure of the membrane at various positions along its 
length indicates clearly that the tension decreases rapidly 

^Yoshii, ?jntsc}uJ, Ohrenhetlk^ Bd. 59, 1909, pp. 201-501. 

Wittmaach, T^e'Usch.J. Ohrenhcilk^ Bd. 54, 1907, and Bd. 59, 1909. 

Marx, 'ZeitscluJ. Ohrenheilk^ Bd. 59, 1909. 


144 


SPEECH AND HEARING 


from the base to the apical end. Since for such a system the 
frequency is given by 

where E is the elasticity and m the mass^ the variation in the 
vibrating mass and in the length of the transverse fibres over 
the length of the basilar membrane would cause a frequency 
variation of ii-fold. Inasmuch as one can hear through a 
frequency range looo-fold, the tension must vary through a 
range of 8000 to i. This seems like a rather large variation 
although it is not entirely impossible as is pointed out by 
Wilkinson and Grayd However, the treatment of the cochlea 
as such a simple vibratory system may lead to results which 
are very far from the truth. In the above computation no 
account was taken of the frictional forces which may be the 
controlling forces for such small openings. For example, the 
canal on either side of the membrane is less than a square 
millimeter in cross-section. When all the factors are taken 
into account, it may not require such a large variation in the 
tension of the fibres to satisfactorily cover the frequency range 
that is used in hearing. Some further investigation along this 
line is badly needed. The problem is difficult because exact 
information of the mechanical constants involved is not 
available. 

Another fact of Importance In the hearing mechanism is 
that the chain of. bones in the middle ear has a non-linear 
transmission characteristic. In the process of transmitting 
complex tones this part of the ear acts like a detector tube in 
a radio circuit. When a single tone is transmitted through 
to the cochlea, not only the impressed frequency l)ut, in 
general, all its harmonics are sent into the cochlea, and the 
magnitude of the harmonics becomes greater compared to the 
fundamental as the intensity of the tone is increased. Sim- 
ilarly, when two tones are transmitted, the harmonics of each 
tone and also of the summation and difference tones are 

1 Wilkinson and Gray, “Mechanism of the Cochlea,” p. 67. 



MECHANISM OF NERVE CONDUCTION 


125 

transmitted to the cochlea. It is thus seen that when loud 
complex sounds are sent into the ear a very complex pattern 
is set up on the basilar membrane. For this reason one might 
be justified in saying that the character of the sound is inter- 
preted according to the pattern existing on the basilar mem- 
brane. A recognition of this fact has led Scripture and others 
to reject the Helmholtz resonance theory and replace it by 
what they call a pressure pattern theory. 

However, in speaking of the elements of the ear mechanism 
as having resonant elements, it must not be understood that 
this resonance is of the same type that would exist in a tuning 
fork or stretched string where the tone exists a considerable 
time after the driving force is removed. Due to the very 
small dimensions in the ear the frictional forces are very 
large and consequently the damping is great. It probably 
is as great, if not greater, than that existing in a telephone 
receiver. Consequently, the hangover, that is, the vibration 
existing after the stimulating tone has ceased, is so small that 
it is almost imperceptible. In listening to a continually vary- 
ing source of sound the form of the vibration of the basilar 
membrane is set up and dies down so quickly that it follows 
changes with no perceptible delay. For this reason when the 
ear mechanism is compared to a harp or a piano a wrong 
impression is usually created. 1 

Mechanism of Nerve Conduction 

A complete theory of hearing must include an explanation 
of the nerve action which takes place after stimulation. In 
it probably lies the answer to the question: 'HTow does the 
ear sense loudness?” The auditory nerve is very similar to a 
cable trunk. It contains about 3000 medullated nerve fibres, 
each coUvsisting of an '‘axis cylinder” surrounded by a fatty 
substance called the myelin. The axis of this cylinder has a 
diameter of about .001 centimeter and forms only about 9 per 
ceht'*of the fibre. It Is thus seen that nerve fibres are con- 
structed very much like insulated telephone wires and bound 



12,6 


SPEECH AND HEARING 


together ’ in a strikingly similar manner to telephone cables. 
This analogous structure led some physiologists to the con- 
clusion that all nervous impulses were electrical in origin 
and that their transmission was very similar to the electrical 
transmission on telephone lines. This theory, however, was 
found to be untenable. 

Most of the earlier experiments on nerve conduction were 
made with motor nerves so that the mechanism of nerve con- 
duction described here is based upon such experiments by 
several investigators. However, the recent work of Adrian ^ 
shows that the action in the sensory nerves is essentially the 
same. A nervous impulse may be excited by heat; chemical, 
electrical, or mechanical stimuli; or by reflex stimuli. Touch- 
ing the nerve with a red-hot iron, with an acid, or pinching 
it, sets up a nervous impulse. The most common method in 
the physiological laboratories of exciting such an impulse is 
to use the shock obtained from a “make-and-break” induction 
coil. In order to set up such an impulse, the strength of the 
stimulus and its rate of change must be greater than a certain 
minimum. It has been found that the nervous impulse which 
travels along the nerves is not at all analogous to an electrical 
current travelling along a wire. An elemental nerve fibre has 
no impulse at all or else it fires with its full force. In other 
words, in normal nerve fibres the impulse is either of normal 
strength or zero strength throughout its entire course anrf 
seems to be the same regardless of how it is stimulated. The 
minimum value for starting the full nervous Impulse is dif- 
ferent for the different nerve fibres constituting the nerve. 

In describing nerve conduction, physiologists frequently 
say it is similar to what goes on when a gunpowder fuse is 
lighted. The rate of the fire travelling down the fuse and the 
intensity of the heat which it creates are in no way dependent 
upon the way the fuse was lighted at the end. From this point 
of view it is seen to be a necessary condition that the loudness 
produced by a tone exciting the ear must be directly related to 
the number of fibres being excited and the rate at which the 
^Book entitled “The Basis of Sensation/* published in 1928. 



MECHANISM OF NERVE CONDUCTION 127 

excitations occur, since each fibre always carries its maximum 
impulse. It would seem necessary also that the minimum 
stimulus to excite each fibre must differ greatly. 

That this is true is beautifully illustrated by some experi- 
mental work ^ of Porter and Hart. The nerves controlling 
muscle contraction were stimulated by electric shocks. The 
currents producing these shocks were gradually increased. 
The successive contractions of the muscles did not increase 
gradually but in definite steps as shown in Fig. 67. This 
figure is a record taken from their experimental work. The 
height of each line is a measure of each successive contraction. 

If the auditory nerves act in a similar way, then, as a tone 
is gradually increased in intensity from below the threshold 
to loud values, the excitations reaching the brain must increase 
in definite steps, the threshold corresponding to the first nerve 
fibre being excited. When all of the nerve fibres are excited 
and firing at their maximum rate no further increase in loud- 
ness is possible. 

After a nervous impulse has passed down the nerve, there 
is a “refractory” phase during which time the nerve is unable 
to respond or conduct. Then follows a “relative refractory” 
period during which the excitability, the conductivity, and the 
speed of propagation gradually return from zero to normal. 
As the condition of the nerve returns to normal, it overshoots 
the mark and becomes supernormal; that is, it is more sensitive, 
more highly conductive, and the speed of propagation is 
greater. This supernormal condition gradually dies away, 
until the nerve is once again in its normal stage. It is only 
during the relative refractory phases that the nerve conducts 
impulses reduced in magnitude. The length of this refractory 
period has been measured by several observers, and, although 
there is a wide disagreement, the best estimate at the present 
time seems to place it at about .001 second and the relative 
refractory period at about .003 second. According to these 
figures, the maximum number of nervous impulses which a 

^ Porter, E. L., and Hart, V. W., “Reflex Contractions of an All or None Character 
in the Spinal Cat,” American Journal of Physiology ^ October, 1923. 







MECHANISM OF NERVE CONDUCTION 


129 


single nerve fibre can send to the brain is 1000 per second. 
Those periodic excitations greater than 300 per second will 
not be transmitted as normal impulses, since each succeeding 
excitation will lie in the relative refractory period. According 
to Adrian’s work the nerve ending has a greater relative refrac- 
tory period than the nerve fibre itself. In his work on sensory 
nerve endings no frequencies of discharge greater than 1 50 cycles 
per second were observed. Consequently when a pure tone 
having a frequency of 2000 or 3000 cycles excites the ear it is 
probable that the number of nervous impulses being sent to the 
brain per second by each .nerve fibre is considerably less than 
the exciting frequency. This is a very important fact, but up to 
the present no direct experimental evidence is available to de- 
termine the rapidity of the impulses being sent to the brain when 
the auditory nerve is excited by tones of various pitch. Experi- 
ments with transmission circuits indicate that no serious 
interference to the recognition of speech or music occurs when 
differences between arrival times of various component fre- 
quencies are less than .001 second. 

The experimental work of Adrian shows conclusively that 
as the intensity of stimulation increases, the rate of nervous 
discharge also increases. This rate also depends upon the rate 
of change in the intensity of stimulation. To explain some of 
the facts of binaural audition, it seems reasonable to suppose 
that when the nervous discharge takes place it will always 
occur at the same phase of vibration, that is, when the basilar 
membrane is at a maximum amplitude or a maximum velocity 
or at some definite time interyal between these two extremes.^ 
The discharge will not take place at every vibration, but the 
stimulation may be stored up until enough has accumulated 
to produce the discharge. This may occur only once in 
iO or once in 100 vibrations for a given fibre, or it may occur 
when highly stimulated at every vibration, but it cannot ever 

^ According to Sir Thomas Wrightson (see his book, “An Inquiry into the Analytical 
Mechanism of the Internal Ear”), a nerve stimulation takes place four times during 
each cycle at the crest, at the trough, and at those times when the membrane goes 
through the equilibrium position. 



130 


SPEECH AND HEARING 


occur faster than the refractory period. According to this 
view different nerve fibres will discharge at different times 
but always at the same phase of vibration of the basilar niem- 
brane. Consequently^ a nerve composed of a bundle of nerve 
fibres will carry a nerve current to the brain which produces 
a time stimulation pattern there that has the same periodicity 
as the sound waves producing it. 

With this picture of the nervous mechanism In mind it is 
not unreasonable to assume that the loudness (magnitude of 
sensation) of a sound is directly related to the total number 
of nervous discharges coming to the brain. This number is 
dependent both on the number of nerve fibres stimulated and 
on the Intensity of stimulation of each one. According to this 
YiQ.'Wy consider then the loudness effects produced when a 
250-cycle pure tone increases In Intensity from the threshold 
of hearing to very high values. At first a single fibre Is suf- 
ficiently stimulated to cause nervous discharges at a slow rate. 
As the intensity increases, the rate of these discharges in the 
first nerve fibre increases while at the same time other fibres 
start discharging. This continues until a small patch of nerve 
endings at a position 25 millimeters from the oval window is 
stimulated. In this patch some nerve fibres are firing these 
nervous discharges much faster than others, the rate depending 
upon their distance from the position of maximum stimulation 
and also upon their initial sensitivity. The pitch is determined 
by the position of maximum stimulation and the loudness by 
the total number of discharges from all the nerve fibres in the 
stimulated patch. As the intensity is still further increased, 
other patches at positions corresponding to the subjective tones 
are added to those already stimulated, first at 21 millimeters, 
then at 19 millimeters, then at 17 millimeters from the oval 
window, and so on. These subjective tones are produced dur- 
ing the transmission of the sound through the mechanism of the 
middle ear. At very high intensities these patches overlap so 
that some fibres at all positions along the membrane will be 
discharging. When two tones are impressed upon the ear a 
similar thing happens except that besides the patches due to the 



MECHANISM OF NERVE CONDUCTION 


131 

subjective harmonics, others will appear due to the subjective, 
summation and difference tones. For any complex tone a 
corresponding pattern of stimulation will be produced, which 
will depend upon the intensity of the sound received at the 
ear as well as upon its physical characteristics. This pattern 
determines the character of the sound which is perceived. 

This general picture of how the ear works will aid in inter- 
preting the various experimental facts of audition which will 
be discussed in the succeeding chapter. 



CHAPTER II 


Limits of Audition 

When the intensity of a sound is continuously decreased 
it reaches a value where it produces no stimulation of the 
auditory sense. The intensity which is just sufficient to be 
heard is called the “threshold of audibility.” It is the lower 
intensity limit of audition. If the intensity is continuously 
increased it reaches an intensity which stimulates the sensation 
of feeling. This intensity is called the “threshold of feeling.” 
Since intensities higher than this cause pain and injure the 
hearing mechanism, this threshold of feeling serves as a prac- 
tical upper intensity limit to sounds which can be sensed by 
the human ear. If a tone is kept at a given intensity and at 
the same time gradually raised or lowered in pitch it ceases 
to, be sensed by the ear at both an upper and a lower pitch 
limit. 

It is the purpose of this chapter to give data concerning the 
limits of audition; and also to describe experiments which 
enabled such data to be taken. 

Threshold Intensity vs. Frequency 

During the past century a number of observers have made 
measurements of the intensity at the threshold of audibility. 
The results of these measurements have been interesting not 
only to physicists, but also to a number of the other scientific 
groups. A description of the methods which have been used 
by some of these investigators may be of interest. 

In 1870 Toepler and Boltzmann ^ made a determination of 
ear sensitivity. The amplitude of vibration of the air particles 
*Ann. der Phys., Vol. 141, p. 321, 1870. 

132 



THRESHOLD INTENSITY vs. FREQUENCY 


133 

in an organ pipe was determined by light interference methods. 
From the distance to the source at which sound was just audible 
it was possible to determine the amplitude of vibration of the 
tone at the threshold of audibility. 

In 1877 Lord Rayleigh ^ used a whistle as a source of sound 
and calculated the energy emitted by it from the pressure used 
in blowing it. He also used a tuning fork mounted on a 
resonator. From the difference in the decay constants of the 
fork suspended freely in the air and mounted on the resonant 
box it was possible to calculate approximately the energy 
emitted by the box. He made a third measurement using a 
telephone receiver as a source. The deflection of the diaphragm 
for direct current was considered the same as for an alternating 
current when the period was far below the natural period. The 
former was measured microscopically and consequently when 
the volume of air enclosed in the ear is known, it is possible to 
calculate approximately the change in pressure on the ear drum 
from the current flowing in the receiver. 

In 1883 Wead^ used a vibrating tuning fork in an open 
field as a source of sound. The amplitudes of vibration were 
made large enough to be directly measured. From the decay 
constants and the time elapsed before the tone disappeared, 
the absolute value of the threshold intensity was obtained. 

In 1903 Wien ^ used a telephone receiver as a source of 
sound, making direct measurements of the amplitude of vibra- 
tion for loud sounds. By assuming that the amplitude 
increases proportionally with the current, it is possible to 
calculate the amplitude of vibration of the diaphragm at the 
threshold of audibility. He observed results through a range 
of frequencies from 50 to 16,000 cycles. These results were 
generally considered the most reliable until the recent work 
using vacuum tubes and thermophones. 

In 1904 Webster^ used for his source a so-called “ Phone, 

'^Proceedings of Royal Society^ VoL a6, p. 248, 1877. 

^American Journal of Science, 151, VoL 26, p. 177, 1883. 

* Wien, Archiv fUr die gesamte Physiologie, 97, pp. 1-57 (1903). 

^ Boltzmann, F. L., Festschr,, Leipzig, 1904, p. 1866. 



134 


SPEECH AND HEARING 


an instrument so constructed that the amount of sound energy 
emitted by it can be calculated. 

In 1905 Abraham ^ used as a source of sound a telephone 
receiver attached to a brass cylinder^ the diaphragm forming 
its base and an ear piece its top. The change in pressure in the 
cylinder for a direct current in the receiver was determined by a 
sensitive manometer. He obtained approximately the same 
sensitivity for the two frequencies^ 250 and 500 cycles. These 
frequencies were well below the natural period^ so that the same 
proportionality factor was used for obtaining the pressure 
change as was obtained by the direct current measurement. 

Due to the importance of knowing the absolute sensitivity 
very accurately. Bell Telephone Laboratories took up the 
problem in 1920 and published the results of their work in 
1922.2 During this same period the problem was also being 
worked upon by Kranz whose results were published in 1923.*'^ 

The methods used in this recent research work are based 
upon the thermophone formula developed by Arnold and 
Crandall ^ and later modified by Wente.^ (See Appendix A.) 
When an alternating electrical current is superimposed upon a 
direct current and sent through a very thin metal strip, it 
generates a sound wave. By means of the formula mentioned 
one can calculate the amplitude of the pressure variation pro- 
duced when the metal strip is enclosed in a small gas chamber. 

The new tools besides the thermophone which made it pos- 
sible to obtain more accurate data concerning threshold inten- 
sities were vacuum tubes (in the form of oscillators, amplifiers, 
and rectifiers), condenser transmitters, and attenuators accu- 
rately calibrated throughout a wide range of intensity. In the 
investigations described in the previous chapters these same 

^ Comptes Rendus^ Vol. 144, p. 1099, 1907. 

2 Fletcher, H., and Wegel, R. L,, “The Frequency-Sensitivity of Normal Ears,” 
Physical Review ^ June, 1922, 

3 Kranz, F. W., “Minimum Intensity for Audition,” Physical Review, Vol. 21, 
No. 5, May, 1923. 

^Arnold, H. D., and Crandall, I, B., ‘'The Thermophone as a Precision Source 
of Sound,” Physical Review, Vol. X, No. i, July, 1917, pp. 22-38. 

® Wente, E. C., Physical Review, Vol. 19, April, 1922, pp. 333-345. 


THRESHOLD INTENSITY vs. FREQUENCY 


135 

tools were almost indispensable and are now being used in a 
great many other lines of investigation in acoustics. 

In the work at Bell Telephone Laboratories two methods 
of determining the absolute values of the threshold of audibility 
were used. In the first method advantage was taken of the 
availability of a calibrated high quality telephone system. A 
schematic of such a system is shown in Fig. 68 . By adjusting 
the potentiometer it was possible to make the tone coming 
from the system receiver, B (Case i), sound as loud as that 
produced when the ear was held in the same position as the 
condenser transmitter, (Case a). This reading of the attenua- 
tor was found for all frequencies from loo to 2000 cycles. Since 



Figure 68. 


the sound energy striking the condenser transmitter was known 
in terms of the voltages generated by it, being previously 
calibrated by means of the thermophone, the energy going 
from the receiver of the system into the. ear of the observer 
could be calculated from the potential difference at the ter- 
minals of the condenser transmitter and the reading of the 
attenuator. 

To make a measurement a potential difference of known 
magnitude and frequency was applied at the terminals of the 
condenser transmitter and sufficient attenuation was intro- 
duced into the system to make the sound from the receiver 
inaudible. The attenuation was then gradually removed until 







136 


SPEECH AND HEARING 


the sound just became audible. From the amount of attenua- 
tion for this condition and the voltage impressed upon the 
terminals of the condenser transmitter, the pressure variation 
in the ear was calculated. Four or five readings were taken 
in this way, which gave an average value having a probable 
error of 10 or 30 per cent in the determination of the pressure 
variation. This method was used for making tests with eleven 
different observers — seven men and four women — through a 
range of frequencies from 130 to 2000 cycles per second. In 
this work and also in that using the second method specially 
constructed sound-proof booths were used. 

In the second method a thermal receiver unit small enough 
to be inserted in -the external auditory meatus of the ear and to 
completely close it was used. It consisted of a series of short 
Wollaston wires enclosed in a small brass capsule with small 
holes communicating with the outside air. To calibrate it a 
small chamber was made in front of the diaphragm of a cali- 
brated condenser transmitter by means of a coupler designed to 
fit over the face of the transmitter. The volume of air thus 
entrapped was made equal to that in the ear canal. The 
thermal receiver was ‘inserted in a small hole through the 
coupler and then all the joints were sealed so as to make the 
chamber air-tight. 

The determination of the relation between the voltage 
impressed upon the thermal receiver and the pressure exerted 
by it on the condenser transmitter diaphragm was then a simple 
matter. The voltage at the terminals of the thermophone was 
measured directly and the pressure exerted by it was calculated 
from the voltage produced at the terminals of the condenser 
transmitter. 

Using such a calibrated thermal receiver inserted in the 
ear canal, the minimum audible pressure was determined by 
noting the minimum current for audibility. Measurements of 
the threshold of audibility were made with five people using 
first an air-damped telephone receiver and then two of these 
thermal receivers. From a comparison of the results the air- 
damped telephone receiver was calibrated. The probable 



THRESHOLD INTENSITY vs. FREQUENCY 137 


observational error in this determination was found to be 
8 per cent in the range of frequencies from 500 to 3000 
cycles. The air-damped telephone receiver was then used to 
measure the threshold intensity for 102 ears of men and 
women of various ages. After the measurements were made, 
an otological examination revealed that some of the persons 
tested had defects in their hearing. After all doubtful cases 
were eliminated, data on 72 ears remained which were used 
in the final averages given below. The tests were made in 
sound-proof booths. 

It is seen from the manner in which the measurements 
were made that the first method of calibration gives the pres- 
sure variation at the opening of the external ear provided that 
^ the ear reflects the sound waves in the same way as the con- 
denser transmitter when placed in the same position. Also in 
the second method the pressure variation which is computed is 
that which would be exerted upon the ear drum provided that 
it had the same stiffness as the condenser transmitter dia- 
phragm. Since the ear drum moves and its mechanical 
impedance at some frequencies may be comparable with that 
of the air chamber, the actual pressure variation against the 
ear drum may be somewhat less than that given by these 
observations. 

Since these two methods gave results which were approxi- 
mately the same it seems reasonable to assume that the pres- 
sures which are calculated by either method are not very 
greatly different from those which exist near the ear drum. 
It would be desirable to check this result by direct measure- 
ments of the pressure within the ear canal when apparatus 
and technic have been developed to a stage where this is 
possible. 

In the Kranz ^ method a small thermophone was also 
inserted in the ear. It was constructed, however, so that the 
pressure variation in the ear canal could be directly computed 
by means of the Wente formula. Such a calculation depends 
upon knowing the volume of air in the canal and upon the 
^Loc, ciL 





138 


SPEECH AND HEARING 


assumption that all the walls including the ear drum are 
rigid. A schematic of the arrangements which Kranz used is 
shown in Fig. 69. 

Kranz also repeated the method first used by Wien. He 
used a microscope to measure the amplitude of the motion of the 
diaphragm of the telephone receiver when it was driven at 
large amplitudes. For this purpose a small fibre of wood was 
attached to the center of the diaphragm. In order that this 
should not aifect the results, when making threshold measure- 


SECTION OF ATTENUATION 


o- F- * ci^ooooo .. F— o 

FILTER 



VACUUM-TUBE OSCILLATOR 

INDUCTANCE 



Figure 69. 


ments, the wooden fibre was replaced by a small piece of brass 
which had the same weight. 

Another means of determining the amplitude of the receiver 
diaphragm consisted in using a mirror so mounted as to rotate 
by movements of the diaphragm. A phosphor bronze wire of 
some stiffness was mounted to project from the center of the 
diaphragm and this rested against one sharp edge of a small 
rectangle of steel, another sharp edge being held against a 
copper block by the force exerted by a slight bending of the 
phosphor bronze wire. It is thus seen that movements of the 
diaphragm caused the wire to rock the steel piece about the 



THRESHOLD INTENSITY vs. FREQUENCY 


edge which rested on the copper as an axis. A small mirror 
waxed on to the steel piece gave a broadening of a reflected 
line of light when the receiver diaphragm vibrated. 

A fair agreement was found between the results obtained 
with the thermophone and with the Wien method, except at the 
low frequencies. In general, the thermophone method i^ave 
results which showed the ear to be slightly more sensitive. 
The work of Kranz was also done in sound-proof rooms. 

In 1922 Lane published some results ^ for the threshold 
values for frequencies from 2000 to 18,000. In this investiga- 
tion the tone generator developed by Hewlett- was used as a 
source of sound. The radiating surface of the tone generator 
consisted of a thin aluminum diaphragm 10 centimeters in 
diameter which was actuated by a flat coil of wire carrying an 
alternating current superimposed upon a direct current. The 
reaction between the electrical currents induced in the alu- 
minum diaphragm and those in the coil set the diaphragm 
into vibration, thus causing the sound to be radiated. By 
making certain reasonable .assumptions, it was possible to 
calculate the intensity of the sound at any given distance 
from this radiator when the alternating current actuating it 
was known. Lane’s work on the threshold of audibility was 
done out of doors with observers on a small platform 5 meters 
above the ground and with the source of sound 1 1 meters from 
the ear. This was done at night so as to minimize the inter- 
ference effect of noise. 

In obtaining a final curve to represent the average pressure 
variation necessary to excite the auditory sense of persons 
having normal hearing, the results of only four of the ob- 
servers mentioned above were considered of sufficient accu- 
racy to be included. These observers are Wien, Kranz, 
Fletcher and Wegel, and Lane. For values betw^een 64 and 
4096 the final results were obtained by assigning w^eights of 3, 
14, and 72 to the results obtained by the first three observers 

^ “Minimum Sound Energy for Audition for Tones of High Frequency,” Physical 
Remew^ May, 1922. 

2 “A New Tone Generator,” Physical Review (2), xix, January, 1922, p. 52. 


140 


SPEECH AND HEARING 


mentioned. These weights are proportional to the number of 
ears tested in each case. The values for the higher frequencies 
were obtained from Lane^s data. A correction of 6 db was 
applied to his data to bring them into line with the other data 
at frequencies between 2000 and 4000 cycles. This is justified, 
since the work was done out of doors where insect noise would 
produce a slight shift in the threshold. 

The final values are given in Table XVI. The results of 
Wien in the region of 2000 cycles would indicate that the ear 
was more sensitive by about 35 db. This difference is much 


TABLE XVI 


Frequency (dv) 

Pressure (bars) 

Power (watts X 10 “^2) 

64 

.12 i 

35 

128 

.021 

1 .06 

256 

•00391 

.036 

512 

.0010 

.0024 

1024 

.00052 

.00065 

Frequency (dv) 

2048 

4096 

8192 

16,384 

18,500 

Pressure (bars) 

.00041 

. 00042 

.0025 

•13 

4.1 

Power (watts X io“^^) 

. 00040 

.00042 

.015 

41 

400,000 


larger than any observational error would warrant. However, 
the work was carefully done and the method should yield 
correct results. The results of Kranz would indicate that in 
the middle pitch range the ear sensitivity was about 4 SU 
more than indicated by these figures and in the lower pitch 
range was about equal. Inasmuch as observations were not 
taken by the observers at all the octave frequencies given in the 
table, values for these frequencies were obtained from a curve 
connecting all the observed points. In this way the average 
values given above were obtained. In the third row of this 
table the values are expressed as the power in micro-microwatts 
passing through a square centimeter when a sound wave is 
travelling through air and producing the pressure changes 
indicated. These results are also shown graphically in the 
lower curve of Fig. 70. 

The curve connecting the points is a smooth curve because 
each point represents an average of a large number of persons. 


THRESHOLD INTENSITY vs. FREQUENCY 141 

The curve representing the threshold of audibility for a single 
person is never such a smooth curve. Two such individual 



of 





142 


SPEECH AND HEARING 


An examination of the individual curves used to obtain 
the average values given above indicates that at some fre- 
quencies these curves depart as much as 20 db from the 
average. It is thus seen that each person has a hearing acuity 
which is peculiar to himself. Consequently, he learns to 
interpret the sounds about him with his particular hearing 
mechanism. The sensation produced by the same piece of 
music must be different for each person because of these 
individual variations. 

The values given in Table XVI show what minute changes 
in the air pressure can be detected by the ear when these 
changes take place with a rapidity equal to that produced by 
voice sound waves. Since the atmospheric pressure is approx- 
imately 1,000,000 bars, it is seen that if the pressure is changed 
one-billionth of its total value, such a change is sensed by the 
ear. Tones producing i per cent variation in pressure are so 
intense as to injure the hearing mechanism. 

Feeling Intensity vs. Frequency 

When a tone exciting the ear is continuously increased in 
intensity, it finally reaches a loudness which produces the 
sensation of feeling. If the tone gets much louder than this 
value, it becomes painful. Although this threshold of feeling 
may have no relation to the auditory sense, it does serve as a 
practical upper limit for the intensities of tones which can be 
sensed by the ear. Measurements reported by Wegel ^ have 
indicated that this threshold of feeling can be as definitely 
determined as the threshold of audibility. The results of his 
measurements on forty-eight normal ears are plotted in Fig. 70. 
The same apparatus was used for this work as that used in 
determining the threshold of audibility. Both the curves for 
the threshold of audibility and the threshold of feeling have 
been extrapolated until they intersect. The feeling sensation 
in the middle range of frequencies is first a tickling sensation 

^ Wegel, R. L., “The Physical Examination of Hearing and Binaural Aids for the 
Deaf,’* published in Proceedings of The National Academy of Sciences^ Vol. 8, No. 7, 
July, T912. 



UPPER AND LOWER PITCH LIMITS 


143 


and then it becomes actually painful as the loudness is in- 
creased. In the lower frequency range the sensation of feeling 
becomes milder until, at frequencies around 60 cycles, it is 
sensed as a flutter. As the frequency is still further decreased 
to the point where the two curves intersect, it is very difficult 
to distinguish between the sensation of feeling and the sensa- 
tion of hearing. This same difficulty also exists at the upper 
limit. 

Upper and Lower Pitch Limits 

As pointed out by Wegel, this constitutes the first rational 
method of defining what is meant by the upper and the lower 
pitch limits of audibility. It is readily seen from these two 
curves that both the upper and lower limit of audibility on 
the pitch scale will be entirely dependent upon the particular 
intensity at which the measurements are made. When this 
intensity is such that it is just producing the sensation of feeling 
and also of hearing, the upper or lower limit of audibility is 
reached. The two curves may be considered as representing 
the limits of audition both on the pitch and the intensity scales. 
They are boundary lines separating those tones which can be 
sensed from the tones which cannot he sensed by an average 
normal ear. Parts of these boundary lines are dotted because 
accurate data are lacking for those particular regions. The 
two dotted lines on either side of the main boundary lines 
show the probable deviation of an observation made upon one 
particular person. In other words, one-half of such observa- 
tions will lie within the two dotted curves as indicated. 

A large amount of data has been taken by many observers 
on the upper and lower limits of audibility of pitch. An 
examination of most of this work indicates that not much 
attention was paid to the particular intensity used in such a 
determination. Also, the tones produced by whistles, by 
bowed strings, or by striking metal bars, the instruments 
usually used, were not pure tones. Also, it has been found 
that the variation in the sensitivity of the ears for high fre- 
quencies is very large among different individuals.' Usually 



144 


SPEECH AND HEARING 


the number of individuals tested was too small to give a good 
average. The limits shown in Fig. 70 were chosen after con- 
sideration of all the available data and also in view of our own 
experiments in the laboratory. It is seen that the limits on 
the pitch scale chosen are from 20 to aopoo cycles per second. 
These are average values. There is no doubt that some per- 
sons have a keen acuity for notes of high pitch and could 
possibly hear notes having a frequency much higher than 
2 opoo cycles per second. Also, it is possible that the auditory 
sense is stimulated in some persons by frequencies lower than 
20 cycles per second. Organ pipes having a pitch lower than 
this have been constructed, but the sensation produced by 
them is probably due to the overtones rather than the funda- 
mental tone being emitted. 

The area enclosed by the two curves giving the threshold 
of feeling and the threshold of audibility is called the "'auditory 
sensation area.’’ To each point in it there corresponds a 
definite auditory sensation when the ear is acted upon by a 
tone having the frequency and the intensity indicated by the 
coordinates. Pure tones outside of this area produce no audi- 
tory sensation. It will be noticed that the scale of frequency 
and also the scale of intensity used in this chart are logarithmic. 
It is almost imperative that such scales be used when represent- 
ing such large ranges. As will be seen from the discussions 
in the next chapter, the choice of such a scale is more in keeping 
with the way one perceives changes in pitch and intensity. 



CHAPTER III 


Minimum Perceptible Differences in Sound 

There is a well-known law in psychology called the Weber- 
Fechner law which states that the increase of a stimulus neces- 
sary to produce a just discernible increase in the resulting 
sensation bears a constant ratio to the total stimulus. It is 
sometimes stated in the form that the magnitude of the sen- 
sation produced is proportional to the logarithm of the stimu- 
lus. If the same law applies to the hearing sensation, then the 
fractional Increase in intensity, which is just perceptible as a 
change in intensity, should be a constant independent of the 
intensity. Similarly, the minimum fractional increase in fre- 
quency, which is perceptible to the ear as a change in pitch, 
should be constant. A number of different observers have 
made attempts to determine these ratios. The apparatus 
available for this work seriously limited its accuracy. Organ 
pipes, tuning forks, and sometimes falling steel balls hitting 
upon steel plates were used as sources of sound. 

Minimum Perceptible Differences in Intensity 

The recent work of Knudsen was more accurate than any 
of the previous observers, not only because of careful observa- 
tions, but because he used the accurate tools described in the 
last chapter. 

A schematic of his apparatus ^ for determining these two 
ratios is shown in Fig. 72. As indicated, the source of sound 
was a telephone receiver actuated by the electrical current 
from a vacuum tube oscillator. By means of the resistances 

^ Knudsen, V. O., “The Sensibility of the Ear to Small Differences of Intensity 
and Frequency,” Physical Review , Vol. XXI, No. i, January, 1923, 

14s 



SPEECH AND HEARING 


146 

in the circuit, the output of the oscillator could he vaiied by 
any desirable measurable intervals. The circuit for measuring 
minimum perceptible changes in intensity is so designed that 
a motor-controlled key periodically changes the resistance R 
across which the receiver is shunted by any desired time 
intervals. The tone then emitted by the receiver will vary in 
intensity depending upon the speed of the motor which operates 
the key. Knudsen found that the best conditions for determin- 
ing the minimum change were obtained when this key changed 



Figure 72, 


the intensities at a rate of about fifty times per minute. The 
resistances are first adjusted so that the change in intensity is 
plainly perceptible. This change is gradually reduced until 
the just perceptible difference is determined. 

Knudsen’s apparatus, however, limited his observations to 
a frequency range of from 100 to 4000 cycles, ft also limited 
the intensity range. To extend both these ranges Bell Tele- 
phone Laboratories took up the problem. Hie method used 
involves the principle of beats. Klectrical currents from two 
oscillators producing slightly different frecfuencies are sent 


MINIMUM PERCEPTIBLE DIFFERENCES 


H7 


into a special telephone receiver. When the receiver is held 
to the ear beats are produced. The amount that the intensity 
of the tone fluctuates can be controlled by changing the relative 
magnitudes of the current from the two oscillators. At high 
intensities and low frequencies, it was necessary to use a special 
telephone receiver to avoid distortion. This receiver was of 
the moving coil type and would reproduce pure tones through- 
out the entire range in the auditory sensation area. The 
arrangement of the apparatus is shown in Fig. 73. The 
oscillator and the attenuator in circuit No. i were set so that 
the telephone receiver produced a tone having any desired 
frequency and sensation level. Oscillator No. a was then 
adjusted so as to produce a tone of slightly different frequency. 
As a result, beats were produced. 



Figure 73. 


Tests were made to determine the differential sensitivity 
of the ear for different rapidities of fluctuations of intensity. 
It was found that the ear was most sensitive when the number 
of beats per second was kept between i and 6. When the 
number became as low as i every 5 seconds or as high as 25 
per second, then the change in energy necessary to be perceived 
was about three times greater than that obtained when the 
number of beats remained between i and 6. For this reason 
in the experimental work the rate of 3 beats per second was 
chosen as being best ff)r perceiving small differences in intensity 
and all the results reported below were obtained by using this 
rate. 

The minimum audible voltage of one attenuator, say, Bi, 
was determined^ Bz being set far below the minimum audible 







148 


SPEECH AND HEARING 


voltage. Bi was then set at any desired value and B2 adjusted 
until the observer signalled that he heard beats. The setting 
of Bz was changed each time the observer signalled whether 
or not the tone seemed to fluctuate. After about twenty such 
judgments the operator was able to locate with considerable 
certainty the setting of B2 for which the observer was just 
able to detect a fluctuation in intensity. If Bz was set below 
this value the fluctuation in intensity was imperceptible. The 



r.m.s. voltage introduced into the receiver circuit by each 
oscillator could be calculated from the readings of the am- 
meters and the settings of the attenuators. At any frequency 
the r.m.s. alternating pressure on the ear drum is proportional 
to the voltage introduced into the receiver circuit. Complete 
series of measurements were made on' twelve male observers 
at frequencies ' of 35, 70, 2.00, 1000, 4000, 7000, and 10,000 
cycles per second and at intensities from weak tones near the 
threshold of audition to very loud tones near the threshold of 
feeling. The results of these measurements are shown in 
Figs. 74j 75j 7 ^- Iji the first two figures mentioned the 

abscissas represent sensation level and the ordinates represent 
differential sensitivity. In Fig. 76 the data are arranged to 
show the variation of differential sensitivity as the frequency 
•changes. ‘ . 

For sensation levels above 50 db, the fractional change in 
.the intensity which is just perceptible is between 5 per cent 
arid 10 per cent.' At a sensatibil level of 10 db'the' fractional 




DIFFERENTIAL SENSITIVITY 


MINIMUM PERCEPTIBLE DIFFERENCES 


149 


increase must be 73 per cent to be perceptible. For frequencies 
as low as 60 cycles a change of 20 per cent is just perceptible 




SENSATION LEVEL 


Figure 75. 




150 


SPEECH AND HEARING 


at the high levels^ and an Increase as much as 200 or 300 per cent 
is necessary at levels as low as 10 db. This work was done 



Figure 78. 

by R. R. Riesz^, who evolved the following formula for repre- 
senting the results: 



MINIMUM PERCEPTIBLE DIFFERENCES 


I ci 


where 


/ 


Sx + {St 


Sx = .000015/ + 


St = .3 + .0003/ 


Sx)lO 


126 
+ f 

19s 


! ! 


/ 


n = 


24,400 


+ 


Mf 


A/ 


358,000/ ■ +/-■ ^ 3500 +/ 


and where — (called the differential sensitivity) is the mini- 
mum fractional increase in the intensity that is just perceptible 
and a,, the sensation level of the tone before the increase. It 
is evident from this formula that S^ is the differential sensi- 
tivity for high sensation levels and St the differential sensitiv- 
ity for a, = o or at the threshold of audibility. 

For convenience in calculation, the values of S^, S,, and 
n are given in curves of Figs. 77 and 78. For example, con- 
sider the values for a looo-cycle tone. The values of S^y 
St, and n are .051, 1.35, and .28, respectively. Then for this 
tone 

A/ 


J 


= .051 + 1.3 X 10 


- .02Sa, 


(2) 


Minimum Perceptible Diferences in Frequency 


To determine the minimum perceptible change in fre- 
quency, Knudsen used the same apparatus as was used to 
determine the minimum perceptible differences in intensity, 
except that a small capacity was added and subtracted periodi- 
cally from the capacity in the oscillating circuit. By adjusting 
this added capacity to the proper value, any change In pitch 


M 

that was desired could be obtained. The average values of ~j 

which he obtained are shown in Fig. 79. For the higher and 
the lower ranges of pitch, the curves have been extrapolated 


152 


SPEECH AND HEARING 


beyond the observed data. For frequencies between 500 and 
4000 the minimum fractional difference in frequency which is 
perceptible is .3 of i per cent. For the lower and the higher 
ranges of frequencies^ it requires a greater fractional change in 
frequency to cause a perceptible change in pitch. Only a 

small amount of data is available which shows how — varies 
with different sensation levels but it indicates that ^ becomes 

larger in about the same way as becomes larger as the 

sensation level becomes lower. The data given in Fig. 79 
correspond tb a sensation level of 40 db. 



Fig. 79. — Minimum Perceptible Difference in Frequency. 


Minimum Time for Tonal Perception 

Another measurement which is of particular interest to 
psychologists is the minimum time a pure tone must excite the 
ear in order that it be sensed as a tone having a definite pitch. 
Here again the data available are rather discordant but the 
most probable values are given in Table XVIL 



MINIMUM PERCEPTIBLE DIFFERENCES 


^53 


TABLE XVII 


Freq. 

Weak Tones 

1 Medium Tones 

Time (Sec.) 

Cycles 

Time (Sec.) 

Cycles 

128 

256 

0 . 0946 

12. 1 

0.06908 

17.6 

384 

0.0627 

24.08 

0.044S 

17. 1 

512 

0.0579 

29.64 

0.04274 

21.8 


From such limited data it is difficult to make any safe 
generalizations but it is seen that the time is approximately 
independent of the frequency of the tone and is about one- 
f twentieth of a second. Most of the unvoiced stop consonants 
have a duration less than this in ordinary speech so that 
apparently no sense of pitch plays a part in their interpretation. 

Le’oels of Frequency {Pitch) and Levels of Intensity {Phonic 
Level) 

In Part One, Chapter III, the sensation level of a sound 
reaching the ear is defined as the number of decibels above the 
average threshold for normal ears. The relation between 
sensation level and intensity of the sound is dependent upon 
the character of the sound so that for complex tones, no simple 
relation exists between the two quantities. As will be dis- 
cussed in a later chapter, two sounds are equally loud when 
they produce the same magnitude of sensation. Experiments 
; have also shown that there is no simple relation between 
loudness and intensity. For these reasons, the logarithmic 
scale of intensity is called '‘level of intensity” rather than 
"sensation level” or "loudness.” The most logical relation 
then between the intensity level a and the Intensity / Is given 
by the equation 

a = A\o%^^ ( 3 ) 
where A and Jo are arbitrary constant? to be chosen to give the 


154 


SPEECH AND HEARING 


most convenient scale. If the bel is the unit used for repre- 
senting the level and the microwatt per square centimeter for 
representing the intensity and also if the comparison intensity 
is taken as one microwatt then 

a (bels) = logio J (microwatts). 

If the db is used, then A must be put equal to lo. As used in 
the International Critical Tables the intensity level is called 
phonic level. According to this definition, the zero phonic 
or intensity level corresponds to the intensity of sound in a 
free plane wave when i microwatt of power flows through a 
square centimeter. It also corresponds to the intensity operat- 
ing upon the ear drum when a pressure of approximately 
twenty bars is produced in the ear canal. As will be seen from 
the auditory sensation area chart, this zero level corresponds 
very nearly to the level which gives the maximum pitch range. 
Also, as will be evident from the data of Part One, Chapter 
III, it corresponds closely to the average speech intensity close 
to the mouth since the lo microwatts of power flow through 
an area of about lo square centimeters as it is radiated into 
the air. 

For these reasons it seems logical to choose this level as a 
standard for comparing sound levels existing in any acoustic 
field and it is so used in this book. Using this terminology, 
the sensation level is the difference between the phonic levels 
of the tone at the given intensity and at the threshold intensity. 
The letter a will be used to designate the intensity level, ao 
being the particular value corresponding to the threshold. 
Then the sensation level as is given by 

as = a - ao. (4) 

There is a definite relation between pitch as sensed by the 
ear and the frequency of vibration, so it seems reasonable to call 
the logarithmic scale of frequency a pitch scale; that is, the 
level of frequency is the pitch. In music, this level is deter- 
mined by the position of the note upon the musical staff. For 
representing the quantitative relations in audition, then, 



MINIMUM PERCEPTIBLE DIFFERENCES 


155 

P = A\ogf/f,. (5) 

The most natural unit to use for measuring pitch is the octave 
which means that the base of logarithms to use is 2. Also 
if / is measured in kilocycles and the reference pitch is taken 
as I kilocyclej then 

P (octaves) = log2 / (kilocycles) =3,32 logio /. (6) 

The reference pitch chosen corresponds approximately to 
'^high C’’ on the musical staff. The pitch steps on the chro- 
matic scale are all equal to a semi-tone and correspond to a 
frequency ratio of tV. For readily transferring from the 
musical notation to pitch numbers^ the table below is given. 
As indicated these numbers are for international standard 
pitch and for the first octave above ''high C.’' 

Chromatic Scale 


{a"' ~ 1760 cycles per second) 


Musical Nutation 

Pitch Numbers 

c 3 

0.066 

a 

CtT 

0.149 

d 

dif 

0.232 

0*315 

e 

0.398 

f 

0.482 

f-rf 

0.566 

g 

0.649 


0.732 

a 

0.816 

ri» 

0.899 

h 

0.982 

c‘‘ 

1 .066 


For physicists pitch — 1024) subtract .032 from each pitch number, 
b'or each octave above the one represented add i and for each octave below subtract 
I from these numbers. 

A pitch unit which is riv of an octave has been found con- 
venient for practical use. Such a unit is logically called a centi- 
octave. It is approximately J of a semi-tone. 

To enable one to readily transfer from pitch to frequency 
or the reverse, a curve showing the relation expressed in 



156 


SPEECH AND HEARING 


equation (6) is shown in Fig. 80. For comparison purposes, 
the width of one full tone step on the major scale is shown on 
this chart. A curve showing the relation between intensity 
and intensity level is given in Fig. 81. Using these scales of 
pitch and intensity level, the boundary lines of the auditory 



sensation area have been replotted and are shown in Fig. 82. 
The lower curve gives the values of ao and the upper curve the 
values of o^. 

If the minimum and maximum audibility curves were 
plotted on an energy scale, the perceptible increment A£ near 
the maximum audibility curve would be a million million times 




MINIMUM PERCEPTIBLE DIFFERENCES 


157 






I.U 

0.9 

.^ 0.8 

dio.7 

<0 tu 0'^ 

z> 

UJO 

Hcq ri4 

5< 

“*(0 

rio.5 

CD' 

0.2 

0.1 

0 





























7^ 










/ 

FOR NUMBERS. REPRESENTING 
THE INTENSITY ..WHICH ARE 




_/ 

INUI E3C 1 WtZCN U.l niNL' I.U, 

SHIFT THE DECIMAL POINT SO AS TO 
MAKE THE NUMBER COME WITHIN 



/ 

7 1 1_IM1 1 0. 

READ VALUE OF INTENSITY LEVEL 
CORRESPONDING TO NUMBER THUS | 

CHANGED AND ADD TO OR SUBTRACT 
FROM IT’ 1. FOR EACH POSITION THE 
DECIMAL POINT IS SHIFTED TO THE 



/ 


/ 

, 

LEF 

T OR 

RICH 

T RE 

:sp'T 





7 











L 











oi t I I I - I I I- I 1 ^ 1 

01 23456789 10 

INTENSITY (MICROWATTS PER CM?) 

Fig. 81. — Relation between Sound Intensitt and Intensity Level. 





■ Figure 8a^ ‘ 



158 SPEECH AND HEARING 

larger than its length near the minimum audibility curve, 
whereas when they are plotted on these new scales, the per- 
ceptible increment in level remains approximately constant, 
changing by less than a factor 10 for 90 per cent of the distance 
across the auditory sensation area. 

From equation (3) it is seen that the minimum perceptible 

level difference Aa expressed in db is related to ^ by the 

equation 

A„=,olog(.+y)ii^y. (7) 

the last relation holding only when A/ is small compared to /. 

AJ 

Substituting the values of — from equation (i) into this ^ 
equation, a set of values of Aa for all pitches and levels can be 



100 £00 300 400 500 

PITCH ( CENTI -OCTAVES ABOVE I KILOCYCLE) 


Figure 83. 


obtained. A representation of these values is given on the 
auditory sensation area chart in Fig. 83. Each small number 
gives the value of Aa corresponding to its position in the 








MINIMUM PERCEPTIBLE DIFFERENCES 


^59 


auditory sensation area. It will be seen that the differential 
sensitivity for intensity varies from .21 to 9 db^ depending 
upon the position in the auditory sensation area. An inspec- 
tion of these numbers shows that one decibel represents about 
the change that is perceptible under average conditions, 
sometimes being a little more and sometimes a little less. 
For this reason it has proved to be a very convenient size. 
The fact that Ace is not the same at various positions shows 
that the Weber-Fechner law does not hold rigorously. 

Similarly, AP expressed in centi-octaves is related to 


/ 


by the equation 


AP = 100 log2 ( I 


log»2/’ 


( 8 ) 


the last relation holding only when A/ is small compared to /. 

A comparison of the values of AP obtained by applying 
equation (8) to the Knudsen • data with the values of Acx as 
shown in Fig. 83 shows that within the observational error 
and within the limited ranges of frequency and intensity 
explored, they are the same. Therefore, until more extended 
data are obtained for AP, it will be assumed that the figures 
shown on the chart of Fig. 83 can be taken as values of AP as 
well as values of Aa. That this is so is not entirely a coinci- 
dence for the choice of decibels and centi-octaves for represent- 
ing Aa and AP made this approximately true. 


Number oj Distinguishable Tones in Audition Range 

The relations given above make it possible to calculate the 
number of pure tones which the ear can perceive as being dif- 
ferent. For example, if starting at the minimum audibility 
curve, ordinate increments that ai'e successively equal to the 
values of Aa at the sensation levels corresponding to the suc- 
cessive positions, are laid off along a constant pitch line, then 
the number of such increments between the curves for the 
threshold of audibility and of feeling is equal to the number of 
pure tones of constant pitch that can be perceived as being 



i6o 


SPEECH AND HEARING 


different in intensity. A rough estimate of the number of such 
tones can be made by an inspection of the chart shown in Fig. 
83. More accurate values may be obtained as follows: 

The number SN of distinguishable gradations in intensity 
for a small intensity level difference da is given by 


so that 




(9) 


which gives the number of distinguishable differences in 
intensity along any pitch line, ao and «« being where the pitch 
line intersects the boundary curves of the auditory sensation 
area. 

Using the relations given by equations (i) and (7), the 
value of this integral will be found to be 


N = 



(am^ ceo)n 

d’ooIO “h d'i — Sco 

_ 


(10) 


If the Weber-Feqhner law held accurately, then and 
would be equal so that this expression would reduce to 


^ l®§c ao 


dm — «() 

Aa 


(n) 


as it should, for this states that the number of distinguishable 
tones is equal to the total number of db across the auditory 
sensation area divided by the number of db for one tone 
change. 

Approximate values of N can be obtained directly from the 
chart of Fig. 83. The number of distinguishable tones in each 
•line of 10 db change of level may be taken as ten times the 
reciprocal of the value of Aa corresponding to that line. 
Therefore, if the reciprocals of the numbers along any pitch 
line are added together, the sum will be one-tenth the total 
number of distinguishable tones. 



MINIMUM PERCEPTIBLE DIFFERENCES i6i 

^ Values of together with the values of the parameters 

necessary for its calculation, are shown in Table XVIII. It is 
interesting to note that for frequencies as low as 30 cycles the 
auditory sensations are very indistinct. It is seen that very 
little sense of loudness change is possible in this range. The 
hearing and feeling sensations are difficult to distinguish. 
There are more than ten times as many distinguishable changes 
at frequencies between 1000 and 2000 cycles as at a frequency 
of 60. In the last column of Table XVIII the average values 
of for the entire, intensity range are given. They are 
obtained by dividing the numbers in the third column by the 
numbers in the seventh column. 


TABLE XVIII 


p 

/ 

Clm — Oil) 



n 

N 

Ao: 

~ 500 

'll .1' 

33 

.263 

12.61 

.448 

3 -II 

9.65 

~ 400 

61,5 

65 

.182 

7-45 

■415 

34.2 

1 .90 

- 300 

125 

91 

.126 

4*33 

.386 

93-8 

.970 

— 200 

2;o 

II3 

OQ 

0 

2.70 

■358 

188.9 

.598 

— 100 

500 

127 

.062 

1.79 

.318 

299 

.425 

0 

1,000 

134 

.051 

1-37 

.276 

374 1 

•358 

+ 100 

2,000 

I3I 

•053 

1 .07 

.286 

■358 

.366 

+ 200 

4,000 

I 2 I 

.074 

1*75 

.361 

259 

.467 

+ 300 

8,000 

9^ 1 

. 128 

2.86 

.456 

119 

. 807 

+ 4 °o 

16,000 

41 j 

.240 

5.18 

•534 

16,34 

2,510 


In a similar way the number of tones of different pitch 
can be obtained. The number of distinguishable differences 


bN in a small pitch interval bP is given by bN = — . The 
total number of different gradations is then 

N m 

where Pi and P% are the pitch limits. 

The number of perceptible differences in pitch N depends 
upon the path taken through the auditory sensation area. 



i 62 


SPEECH AND HEARING 


There are three paths of particular interest, namely, (i) hav 
a constant intensity level, (2) having a constant sensar 
level, (3) having a constant loudness level. Any of these t 

be obtained by graphical methods by plotting values of ■ 


corresponding to each position in the auditory sensation a 
traversed and measuring the area under the resulting cu 
limited by Pi and P2. For example, in Fig. 84 are shown 
paths for equal levels: the solid curved lines — sensation lev 
the dotted lines — loudness levels, and the horizontal solid li 



— intensity levels. The dotted curves are taken from data 
be discussed later. 

The number of distinguishable changes in pitch as the t< 
changes along these lines is given in Table XIX. It is s 
that in the process of changing the pitch of a pure tone fr 
the highest to the lowest audible pitch, there are approximat 
2000 perceptible gradations in pitch. The maximum num. 
of perceptible gradations is obtained when the intensity le 
is kept constant at about zero level. As stated in Chapte; 



MINIMUM PERCEPTIBLE DIFFERENCES 


163 

there are about three times this number of rows of rods con- 
taining nerve endings in the basilar membrane, each row con- 
taining 4 or 5 rods terminating in 10 or 15 hair cells. (See 
Fig. 63.) If the tone is kept at a constant sensation level of 
20 db while it is varied in pitch, there will be approximately 
500 gradations in pitch. At lower sensation levels the number 
of gradations is still smaller. It is thus seen that at these low 
levels the position of maximum response must shift over more 
than ten rows of rods before the pitch, change is perceived. 
This indicates that only a very few of the nerve endings in the 
stimulated area are activated at these low levels. As the 
intensity increases, more and more of them become activated. 


TABLE XIX 


Intensity Txvcl 

Sensation Level 

Loudness Level 

a 

N 

OJs 

N 

L 

iV 

— 60 

590 

20 

520 

20 

460 

- 40 

1240 

40 1 

1270 

40 1 

1010 

— 20 

2050 

60 

1640 

60 

1690 

i 

0 

2340 

80 

2180 

80 

2120 


If an ordinate increment of Aa and an abscissa increment 
of AP be drawn in the auditory sensation area diagram, a 
small rectangle will be formed which may be considered as 
forming the boundary lines for a single pure tone. All tones 
which lie in this area sound alike to the ear. The number of 
such small rectangles in the auditory sensation area corresponds 
to the number of pure tones which can be perceived as being 
different. The number 5 N of such tones in a small rectangle 

^ 5 T) 

5 P- 5 a \s 5 N = — ■ The total number of such tones in the 

Aa AP 

auditory sensation area is then given by 


r+m 




164 


SPEECH AND HEARING 


The value of this integral was obtained by obvious graphical 
methods and was found to be approximately 540,000. One 
can obtain a number which is approximately correct for the 
number of distinguishable tones by the following simple pro- 
cedure. Since La and AP are equal and given by the number 
on each square on Fig. 83, the number of tones in each square 
is 500 divided by or by the square of the number in the 
center of each square. If the results for each square thus 
obtained are added together, the total will be the desired num- 
ber of distinguishable tones. 

One might well ask the question: How many complex 
sounds which are different can be sensed by the ear? At first 
thought, one might say that this number is represented by all 
the possible combinations of pure tones. Of course, such a 
number would be entirely too large, for some of these would 
sound alike to the ear, since the louder tones would necessarily 
mask the feebler ones. It is evident, however, that the number 
of such complex sounds will be very much larger than the 
number of pure tones. 

PosiHon Along the Basilar Membrane for Sensing Tones of 
Various Pitch 

These data make it possible to calculate the position along 
the basilar membrane where the pure tones of different pitch 
are sensed. According to the theory of hearing discussed in 
Chapter I, a change in pitch corresponds to a change in the 
position of the maximum response of the basilar membrane. 
It seems reasonable to assume that the nerve terminals are 
uniformly distributed throughout the length of this membrane. 
Such an assumption is supported by the anatomical facts 
available. Consequently, it would be expected that for each 
perceptible step in pitch, the position of maximum response 
shifts the same amount. Since the calculation given above 
showed that there were approximately ,1600 perceptible changes 
along the 60 db loudness level line, each step must correspond 
to .02 millimeter shift in the position of maximum response. 
The tone having the highest pitch, namely, 432, corresponding 


POSITION ALONG THE BASILAR MEMBRANE 165 


to 2O3O00 cycles, Is sensed at the oval window and the one 
having the lowest pitch, namely, — 564, corresponding to 20 
cycles, is sensed at the helicotrema. 

Starting at the oval window the first step of .02 millimeter 
corresponds to a step of pitch from 432 to 430,5, since AP is 
1*5 (see Fig. 83) in this region. In this way it will be found 
that the first 10 steps corresponding to a distance of one-fifth 
millimeter from the oval window end are concerned with 
pitches from 432 to 418 corresponding to frequencies from 
20,000 cycles to 18,000 cycles. The next 10 steps go from 
413 to 405; the next 10 from 405 to 394, and so on. Tn this 
way the position on the basilar membrane for each pitch can 
be obtained. 

It is evident that this procedure is equivalent to the fol- 
lowing mathematical process: Let / be the distance from the 
oval window end to the position of maximum response and /o 
the total length of the membrane and E the distance for each 
perceptible step. Then 


/ = 



dN 



(14) 


The denominator of this fraction is the number of perceptible 
steps and has been calculated above. The numerator can be 
obtained from the same graph that was used in calculating the 
denominator. In this way the curve shown in Fig. 85 was 
obtained. It was from this figure that the positions given in 
Fig. 66 were located. 

It will be seen that a tone of zero pitch corresponding to 
a frequency of 1000 cycles is sensed in the middle of the mem- 
brane. The range used in sensing the speech sounds is from 
7 to 28 millimeters from the oval window, which indicates that 
about two-thirds of the entire length is used for such purposes. 
When more accurate data for values of AP are available it will 
be interesting to compare the results calculated for these posi- 
tions at different levels (sensation, intensity, or loudness). 



SPEECH AND HEARING 

If they are very different some interesting conclusions concern- 
ing pitch and loudness might result. This method of locating 


POSITION ON BASILAR MEMBRANE 


6 6 10 12 14 16 16 20 22 24 26 26 30 32 

MILLIMETERS FROM OVAL WINDOW 


Figure 85. 


the positions on the basilar membrane where tones of different 
pitch are sensed is due to Wegel and Laned 

^ Wegel, R. L., and Lane, C. E., “Auditory Masking and Dynamics of the Inner 
Ear,” Physical Review^ February, 1924. 



CHAPTER IV 


Masking Effects 

It is a common experience that when any sound is impressed 
upon the ear it reduces the ability of the ear to sense other 
sounds. If while a sound A is being impressed upon the ear^ 
another sound B is gradually increased in intensity until the 
sound xA can no longer be heard^ the sound A is said to be 
masked by the sound B. When the ear is stimulated by a 
sounds particular nerve fibres terminating in the basilar mem- 
brane are caused to discharge their unit loads. Such nerve 
fibres then can no longer be used to carry any other message 
to the brain by being stimulated by another source of sound. 
Masking experiments appropriately chosen^ then should enable 
us to determine what portions of the membrane are stimulated 
by any external sound. 

A. A. Mayer ^ was one of the first to point out the experi- 
mental fact that low-pitched sounds had a masking effect dif- 
ferent from that of high-pitched sounds. He stated that a tone 
of low pitch will completely mask one of higher pitch but that a 
tone of high pitch will not mask a tone of lower pitch. The 
apparatus which he used^ however, made it very difficult to 
control the intensity or the purity of the tones used. On 
account of its importance, the problem of masking has been 
studied rather extensively at Bell Telephone Laboratories and 
the investigation is still being carried on. 

Masking of Pure Tones by Pure Tones 

The masking effect of one pure tone by another was deter- 
mined by means of apparatus which was similar to that used in 
the determination of the acuity of hearing described in Chapter 

^ Mayer, A. A., Phil, Mag. ii, 500, 1876. 


i68 


SPEECH AND HEARING 


11 . A damped telephone receiver was used for generating the 
pure tones. Connected to this receiver were two vacuum tube 
oscillators equipped with filters for eliminating any harmonics 
and with attenuators for supplying any magnitude of current. 
The attenuators were arranged so that by turning a dial the 
intensity level of the tone could be reduced very quickly from 
the maximum value to a value below the threshold. The 
intensity level for the threshold was determined both for the 
masked and the masking tones. The masking tone was then 
kept at a constant sensation level while the tones of other pitch 
were gradually increased in intensity until they were just 
perceptible in the presence of the masking tone. The level 
expressed in decibels that the masked tone was raised above its 
threshold value in the quiet is called the threshold shift. 

The results of these measurements are shown in the curves 
of Fig. 86. The frequency of vibration of the masking tone is 
given by the number at the top of each chart and its sensation 
level by the number on each curve. The frequency of vibration 
of the masked tone is given by the abscissa and the threshold 
shift of the masked tone by the ordinate. 

For example, in the fourth chart the masking effects of a 
tone having a frequency of 1200 cycles are shown. It is seen 
that the greatest masking effect is near 1200 cycles, which is the 
frequency of the masking tone. A tone of 1250 cycles must be 
raised to 46 db above the threshold to be perceived in the 
presence of a 1200-cycle tone which is 60 db above its threshold, 
or it must be raised to within 14 db of the masking tone before 
it is perceived. This corresponds to an intensity ratio between 
the tones of only 25. A tone of 3000 cycles, however, can be 
perceived in the presence of a 1200-cycle tone which is at 60 db 
when it is only 8 units above its threshold. This means that 
the intensity ratio between these two tones, under such cir- 
cumstances, corresponds to 52 db or to a ratio of approximately 
160,000 in intensity. 

However, as the loudness of the masking tone is increased, 
all of the high tones must be increased to fairly large values 
before they can be heard. For example, the high frequencies 




SPEECH AND HEARING 


170 


must be raised 75 db above the threshold to be heard in the 
presence of a laoo-cycle tone having a sensation level of 100 
db. But even for such large intensities for the masking tone, 
those frequencies below 300 are perceived by raising their 
loudness only slightly above the threshold value. It should 
be noticed that in all cases, those tones having frequencies 
near the masking frequency, whether they are higher or lower, 
are easily masked. 

It is thus seen that Mayer's conclusion that a low-pitched 
sound completely obliterates higher pitched tones of con- 
siderable intensity and that higher pitched frequencies will 
never obliterate lower pitched tones is true only under certain 
circumstances. A low tone will not obliterate to any degree 
a high tone far removed in frequency, except when the former 
is raised to very high intensities. Also a tone of higher fre- 
quency can easily obliterate a tone of lower frequency if the 
frequencies of the two tones are near together. When the two 
tones are very close together in pitch, the presence of the 
masked tone is perceived by the beats it produces. This 
accounts for the sharp drop in the curves at these frequencies. 
A similar thing happens for those frequency regions correspond- 
ing to harmonics of the masking tone. In the charts for the 
2D0- and 400-cycle masking tones these drops are not shown, 
inasmuch as they were small, but in an accurate picture they 
should be shown. 

These results are plotted in a different way in Fig. 87. The 
abscissas represent the loudness of the masking tones, the 
frequency of which is indicated at the top of each of the charts. 
The amounts that the threshold of the masked tone is shifted 
are plotted as ordinates as in the previous figure. 

For example, in the first chart the results are shown for a 
masking tone of 200 cycles. The curve marked 3000 indicates 
the masking effect of a 200-cycle tone upon a 3000-cycle tone. 
It is seen that the sensation level of the low-pitched tone can be 
raised to 55 db before it has any interfering effect upon the 
high-pitched tone. For higher levels than this it has a very 
marked effect. 









MASKING OF PURE TONES BY PURE TONES 171 

It will be noticed that in nearly all of the charts the curves 
for different frequencies intersect. This leads to some rather 
interesting conclusions regarding the perception of a complex 




SENSATION LEVEL OF Fp 


20 40 60 80 100 

SENSATION LEVEL OF Fp 





SENSATION LEVEL OF Fp 


sensation level OF Fp- 



SENSATION level OF Fp 



SENSATION LEVEL OF Fp 


Fig. 87. — Monaural Masking. 

tone. If, for example, a complex tone had three frequencies 
of 40O5 30O5 and aooo cycles with levels of 50, 10^ and lo^ 
respectively^ the ear would hear only the 400- and 2000-cycle 






X72 


SPEECH AND HEARING 


tones as is evident from the masking curves for 400 cycles. 
It would be necessary to raise the 300-cycle tone an additional 
6 db for it to be heard in the presence of 400 cycles at a 
sensation level of 50. However^ if the complete sound were 
magnified 30 db without distortion so that the three tones 
had levels of 80^ 40^ and 40, respectively^ then the 400- and 
300-cycle tones only would be heard. Under such conditions, 
the 300-cycle tone could be attenuated approximately 8 db 
before it would disappear. These conclusions will be somewhat 
modified when all of the tones are sounding simultaneously, 
as the data were taken for two tones only, but the general 
picture given above will still be true. It follows that the 
sensation produced by a complex sound is different in character 
as well as in intensity when the sound is increased or decreased 
in intensity without distortion. In general, as the tone becomes 
more intense, the low tones will become more prominent 
because the high tones are masked. Due to the non-linearity 
of the ear transmitting mechanism, the low-pitched tones pro- 
duce more subjective harmonics, harmonics in the sensed 
sound but not in the original pressure variations and for that 
reason increase in loudness faster than the high-pitched tones. 
It is a common experience of one working with complex sounds 
to have the low frequencies always gain in prominence as the 
sound is amplified. This phase of the subject will be discussed 
again in a later section. 

The question naturally arises: ‘‘Does the same interfering 
effect exist when the two tones are introduced into opposite 
ears instead of both being introduced into the same ear? The 
answer is “No.’" Curves showing the results of such tests are 
shown in Fig. 88. 

For comparison the results for the case when the tones are 
both in the same ear are given by the light lines. Take the 
case of 1200 and 1300 cycles. It is rather remarkable that a 
tone in one ear can be raised to 60 db, that is, increased in 
intensity one million times, before the threshold value for the 
tone in the other ear is noticeably affected. If the 1300-cycle 
tone were introduced into the same ear as the 1200-cycle tone. 



MASKING OF PURE TONES BY PURE TONES 


m 


its sensation level would need to be shifted 40 db, correspond- 
ing to a lOjOoo-fold magnification in intensity above its thresh- 
old intensity in the free ear before it could be heard. It is seen 



LEVEL OF Fp 



LEVEL OF Fp 


040 

to 

^20 

X 

H 0 


1 








“ rp*i*:;uu 

Fa** 000 


















■OrrdE 





20 40 60 80 100 120 140 
level of Fp 


LEVEL OF Fp 





LEVEL OF Fp 


.CO 60 


0 ^0 

12 20 

01 

£ 










Fa 

= 1 iiOU 

= 1400 


/ 



7 

□ 



7 



7 




lOilii- 



an 



0 20 40 60 80 100 120 140 

LEVEL OF Fp 


0 40 

a 20 

01 
X 
H 


i 




/_ 



rp= j 4iuu 

F2 = 2000 



/ 

1 


7 

1 



z 


/ 





2 


7 

/ 



0 20 40 60 80 100 120 140 

LEVEL OF Fp 


O40 

22 ° 

Dl 

X 0 


r 








~ Fp= 1200 
Fh=3000 


/ 

/ 


z 

7 












7^ 

=ir=I 

7_ 



20 40 60 80 ( 00 120 140 

LEVEL OF Fp 


g40 

S20 

01 

X 0 


^ pj,- iJoo 

Fa -41000 


Z 





7 

7 







7 



z 



J— — jO 




Z 



0 20 40 60 80 lOO- 120 140 

LEVEL OF Fp 


Fig; ‘ 88.— Binaural . Maskijstg. 











174 


SPEECH AND HEARING 


that if one set of curves is shifted about 50 db it will coincide 
with the second set. This strongly suggests that the inter- 
ference in this case is due to the loud tone being transmitted 
by bone conduction through the head with sufficient energy 
to cause masking. 

That this is the case is substantiated by experiments on 
persons having unilateral deafness. If the telephone receiver 
is held to the deaf ear of such a person and the intensity of the 
tone gradually increased, the threshold value will be reached 
when it has a sensation level of approximately 50 db. That 
the sound has been transmitted to the good ear by bone 
conduction is shown by the fact that under such circumstances 
the tone is greatly enhanced by placing the finger in the good 
ear. Another fact which is discussed in more detail later is that 
binaural beats are most pronounced when the intensity of the 
tones in the two ears has a sensation level difference of approx- 
imately 50 db. 

These experiments indicate that when a receiver of the type 
used in these experiments is placed on the ear it communicates 
a certain amount of its vibration to the bones of the head. 
This vibration then is transmitted through the head to both 
ears, the intensity of the stimulation being approximately at a 
level of 50 db below that produced by the diaphragm of the 
receiver acting upon the air in the ear canal. It is claimed by 
some otologists that the vibration communicated to the cap 
of the telephone receiver used in practice is a great aid for a 
person having a certain type of deafness in transferring the 
speech vibrations to the end organ of hearing by means of 
bone conduction. It is seen from these experiments that in 
order for this to be true, the acuity of hearing by the air path 
must be at least reduced 50 db without in any way interfering 
with the acuity of hearing by the bone path. The value 50 db 
is entirely dependent upon the type of telephone receivers used. 
This value may be decreased or increased, depending upon the 
type of apparatus used in communicating the tones to the ear. 

This suggests that interference of room noise to telephone 
conversation is not due principally to that which goes to the 


SUBJECTIVE TONES 


175 


free ear, but to that which gets into the same ear to which the 
telephone receiver is being held, due mainly to leaks under the 
receiver cap. Even when the receiver is held very tightly to 
the ear, enough noise is conducted through the hard shell of the 
receiver and then to the air in the auditory canal to produce 
greater interference than that caused by noise coming into the 
free ear. This conclusion has been confirmed by direct experi- 
ments with telephone users. 

Subjective Tones 

The sharp dips in the curves of Fig. 86 at frequencies corre- 
sponding to multiples of the masking frequency require explana- 
tion. These dips suggest that they may be produced by har- 
monics of the masking tone. For example, the curves for the 
masking tone having a frequency of 1200 cycles and sensation 
levels of either 100, 80, or 60 db look like those which would 
be produced if the masking were caused by three tones having 
frequencies of laoo, 2400, and 3600. A careful analysis of the 
tone by means of the harmonic analyzer described in Part Two 
showed a frequency of only 1200 to be present. At the fre- 
quency corresponding to the dips, beats were plainly audible. 
This indicates that the masking tone creates harmonic fre- 
quencies in the ear. 

As stated in Chapter I, these harmonic frequencies are due 
to the non-linear response of the hearing mechanism. The 
tones introduced by this non-linearity are called subjective 
tones. The magnitude and the frequency of such tones may 
be determined by using the principle of beats mentioned above. 
If while the masking tone is present, an exploring tone is 
changed in frequency and intensity until the beats are most 
prominent, then the intensity and frequency of such an explor- 
ing tone can be taken as the intensity and frequency of the 
subjective tone. It was found that there are three classes of 
subjective tones which are called harmonics, summation tones, 
and difference tones, respectively. When a single tone stimu- 
lates the ear, tones of the first class are produced. The fre- 
quencies of such tones are exact multiples of the frequencies 


176 


SPEECH AND HEARING 


of the stimulating tone. When two tones stimulate the ear, 
a series of subjective harmonics for each tone and also a series 
of difference and summation tones are produced. The sub- 
jective difference tones have frequencies which are equal to 
the differences obtained by subtracting the frequency of one 
tone from that of the other and also by subtracting the fre- 
quency of any harmonic from that of any other harmonic. 
Similarly, the summation subjective tones have frequencies 
which are obtained by taking sums instead of differences. 
When more than two tones stimulate the ear these three classes 
of tones are produced but the situation then becomes very 
complex. For example, it was found that the sensation levels 
of the first two subjective harmonic tones produced by a 
1200-cycle tone having a level of 80, were 60 and 50 db, 
respectively. For a level of 60, the subjective harmonics were 
20 and 15 db, respectively. For levels below 40, the subjective 
tones were undetectable. 

The results of some experiments reported by Wegel and 
Lane are given in the chart of Fig. 89. One tone, called the 
primary tone, was held at a constant level while a second tone, 
called the secondary tone, was varied both in frequency and 
intensity. The resulting sensations are represented on the 
chart. For levels below the masking curves only the 1200- 
cycle tone can be perceived. For levels just above the mask- 
ing curve and in the frequency range between 1200 and 2400 
cycles, only the primary and the difference tones can be per- 
ceived. In other words, in this region, the presence of the 
secondary tone is detected by hearing the subjective difference 
tone between the primary and the secondary tones. For 
higher levels, the primary, the secondary, and the difference 
tones can be perceived. For example, if the secondary tone 
is held at a frequency of 1600 and at a level of 60 db, the ear 
will perceive three tones, namely, the 1200-, the 1600-, and 
the 4*^o-cycle tones. For still higher levels for the secondary, 
very complicated mixtures of tones are perceived. 

A careful analysis was made of the mixture of tones present 
in the ear when a primary tone of 1200 cycles at a sensation 



SUBJECTIVE TONES 


177 


level of 80 db was present along with a secondary tone of 
frequency 700, and at the same level. The component fre- 
quencies were determined by introducing an exploring tone and 
determining the frequencies at which beats occur. If/i repre- 
sents the primary, and/2 the secondary, the frequencies found 
in the mixture were /i, 1200 cycles; /2, 700; /i + /a, 1900; 
fi -/2, 500; 2/1,2400; 2/2,1400; 3/1,3600; 3/2,2100; 2/1 -1-/2, 
3100; 1700; 2/2 -h/i, 2600; 2/2 -/i, 200(?); 4/2, 

2800; 2/1 •+- 2/2,3800; 2/1 - 2/2, 1000; 3/1 +/2, 4300; 3/1 - A 
2900; 3/2 +/i, 3300; 3/2 — /i, 900. No attempt was made 



to determine their magnitudes, although approximate values 
can be obtained by measuring the intensity of the exploring 
tone at which the beats at each frequency are most prominent. 
Such measurements are rather difficult when so many tones 
are present. Except for the absence of frequency 4/1, this 
series is all that would be expected (see Appendix C) if the 
response of the ear were non-linear and represented by the 

6ClX13.tlOri<j / 

X = ao + aip + 

In this equation x is the response of the mechanism of the 



178 SPEECH AND HEARING 

middle ear; ao, ai^ a2:, etc.^ are constants;, and/) is the pressure ^ 
in the ear canal. While frequencies introduced by higher 
powers of the pressure were probably present, they were very 
faint and no careful search was made for them. 

A study of the levels of a primary tone which is necessary 
to .produce detectable subjective harmonics reveals some 
interesting and important data. . To do this, the pure tone was 
held at a convenient level while the presence of the harmonic 
was determined by the beating effect produced by an exploring 
tone as described above. In this way the sensation level for 
tones of various pitches at which the second, the third, the 
fourth, and the fifth harmonic first appeared was determined. 

These data are shown in the curves of Fig. 90. For tones above 
zero pitch no subjective harmonics appear until the sensation 



Figure 90. 

level IS 50 db, where the second harmonic just becomes detect- 
able. In the low-pitched range the harmonics appear when the 
tone IS very faint. For example, for a tone of pitch -4 oc- 
taves, which corresponds approximately to the frequency of the 


1 


SUBJECTIVE TONES 


179 


alternating current usually used in electrical lighting systems^ 
the fifth harmonic appears before a sensation level of 25 is 
reached. The sensation levels usually used while listening to 
speech or music lie between 50 and 100 db, which shows that 
the frequency spectrum impressed upon the inner ear must be 




Fig. 91. — Loudness Spectrum, Dotted Lines Introduced by Overloading in 
Ear. (Loudness Equals 80 TU.) 


very different from that impressed on the outer ear, the 
modifications being due to the nonJinear transmission through 
the middle ear. 

To illustrate this, an estimate was made of the inner ear 
spectrum which resulted from impressing an organ tone on 
the outer ear. For this purpose, the data given above on 
the non-linear properties of the ear were used. It was as- 
sumed that the summation and difference tones were of the 
same order of magnitude as the corresponding harmonics. The 
total amplitude of any one frequency component was obtained 
by taking the square root of the sum of the squares of all 
the contributions to this frequency component. In Fig. 91 
the results of such a calculation are shown. The top chart 



i8o 


SPEECH AND HEARING 


£;ives the spectrum of the tone impressed upon the outer ear; ^ 
the bottom one the estimated spectrum impressed upon the 
nerve terminals of the inner ear. It cannot be claimed that 
this is an exact representation of the inner ear spectrum^ for 
the problem is too complicated for an exact solution^ but it 
probably represents the main facts. A comparison of the two 
spectra in this figure illustrates the large differences which 
are produced in the process of transmission. 

It is thus seen that the fact as noted by Mayer that low 
tones mask high tones much more readily than high tones 
mask low tones is due to the nonJinear response of the hearing 
mechanism^ which results in sending to the inner ear subjective 
harmonics that excite the nerve terminals which would other- 
wise be ready to receive the stimulation from the high tones. 

On the other hand^ the high tones can only produce a stimula- ' 
tion in the regions near the maximum stimulation^ that is, in 
regions where the higher tones are sensed. Therefore, little 
interference will be caused to the sensing of the low-pitched 
tones. 


Calculation of the Form of Vibration of the Basilar Membrane 

From the theory of hearing and the masking curves it is 
possible to find an approximate form of vibration of the basilar 
membrane. Let a primary tone of constant pitch and level 
be impressed upon the ear. Then to find the portions of the 
basilar membrane which are vibrating^ an exploring tone is 
used. Those portions corresponding to pitches of the exploring 
tone where no threshold shifts are produced are not vibrating ^ 
with amplitudes greater than those corresponding to the 
threshold. The other portions are vibrating with amplitudes 
which are measured approximately by the threshold shifts 
of the exploring tone. This statement requires modifications 
for those frequency regions close to the points where the 
primary tone or its harmonics are sensed. In these regions 
the presence of the secondary tone is recognized at a much 
lower level than would otherwise be the case. The amplitude 


SUBJECTIVE TONES 


i8i 

caused by the exploring tone in these regions when its presence 
is just detectable is very much smaller than the amplitude 
caused by the primary tone. In these regions^ however, a 
good estimate of the amplitude can be made from the shape 
of the rest of the curve representing the vibration form. The 
amplitude corresponding to the pitch of the primary tone is 
measured directly by the sensation level of the primary tone. 
Similarly, the amplitude of each of the subjective harmonics is 
measured by the sensation level of the exploring tone when 
the best beats are produced with the subjective tone. 

For convenience in speaking of the amplitude of vibration 
of the basilar membrane, a new term, nerve sensation level, 
will be defined. At each position along the basilar membrane 
it has been seen that there corresponds a tone of definite pitch. 
When a sound is being sensed by the ear, the nerve sensation 
level corresponding to each position on the basilar membrane 
is the sensation level of the pure tone corresponding to that 
position which would produce the same amplitude. 

Using an exploring tone in the manner described above, 
F. H. Graham of Bell Telephone Laboratories made a careful 
study of the nerve sensation levels ^ produced by an 8oo-cycle 
tone with the result shown in Fig. 92. Separate curves are 
shown for several different intensities. The peaks in the 
curve occur at positions corresponding to frequencies which 
are multiples of 800 cycles. The points marked (f?) were 
obtained by the best beat method; the points marked (•) 
were obtained by noting the threshold shift. The scale at the 
bottom gives the position on the basilar membrane and the 
scale at the top denotes the frequency of vibration of the 
exploring tone. 

In a similar way the nerve sensation level produced by three 

^ A nerve sensation level of 80 at 1200 cycles corresponds to an intensity level of 
•— 12. Consequently, the power in a free wave corresponding to this level is 
microwatts. The amplitude of vibration in water corresponding to this is 3 X 10“’^ 
centimeters which is about 10 times the diameter of the molecule. The amplitude 
of the basilar membrane for producing this sensation level is probably not more than 
loo times this value. Consequently at the threshold the displacement is only a small 
fraction of the diameter of the molecule. 



SPEECH AND HEARING 


182 

tones having frequencies of 1000 cycles, 1500 cycles, and 2000 
cycles, respectively, at sensation levels of 80 db, were obtained, 
the results being shown in Fig. 93. The common difference 
tone produces the masking in the region corresponding to 


FREaUENCY 



Fig. 92, Vibration Form of Basilar Membrane Produced by a Pure Tone of 

800 Cycles. 


500 cycles. The subjective difference tone 2000 — 1000 
reinforces the 1000 cycle tone. The summation tone 2000 4 " 
1000, the second harmonic of 1500, and the third harmonic 
of- 1000 units to give the subjective tone 3000 which was 
definitely measured as shown in the figure. 

It is seen that except for frequency regions near those 
corresponding to the masking tone and its harmonics, the 
masking curves become the form of the vibration of the 
basilar membrane by a simple transformation of variables used 
in plotting, the frequency scale being converted into a distance 


SUBJECTIVE TONES 


183 


Ik 




> 


scale along the basilar membrane, and the threshold shift scale 
into an amplitude scale. By keeping this in mind, the masking 
curves as shown in Fig. 86 will convey a good notion of the form 
of the vibration of the basilar membrane when excited by the 
pure tones indicated. 

Figure 94 shows the form of the vibration of the basilar 
membrane when the intensity level of the exciting primary tone 
is — 32 db, corresponding to a pressure of one-half bar in the 
external ear canal. The dotted curve is the locus of the 
maxima for all frequencies which are impressed upon the 
ear at this intensity level. As pointed out in Wegel and 
Lane’s paper,’- the curves become less sharp as the frequency 


FREQUENCY 



Fig. 93.— Vibration Form of Basilar Membrane Produced by the Three Pure 
Tones — looo Cycles, 1500 Cycles, 2000 Cycles. 


is decreased. This is in agreement with what would be 
expected from the dynamical structure of the^ cochlea. At 
very low frecjuencies the stimulus may be conceived as due to 
a more or less bodily motion of the tectorial membrane along 
the basilar membrane. 

1 “Auditory Masking and Dynamics of the Inner Ear,” Thy steal Kemew^ February, 
1924. 


184 


SPEECH AND HEARING 


It is to be expected that similar curves at extremely high 
frequencies should become less definite, probably not by be- 
coming flatter, but by having their maxima at or beycmd the 
proximal end of the organ of Corti. This conclusion Is arrived 
at principally from a consideration of the curves of absolute 
sensitivity of normal ears in which the sensitivity is seen to 
drop off very sharply at about i^poo cyclesd I his sort of an 
assumption is further substantiated by the fact that when 
plotted as displacement of the basilar membrane, sensitivity 
curves of abnormal ears in which the lesion can be reasonably 
well traced to degeneration of the nerves of the proximal end 



Fig. 94. — Amplitude Along Basilar Membrane for Different FRKqnKNcnvS 
(R.M.S. Pressure .5 Dynes.) 

of the basilar membrane, also indicate similar sharp cut-offs 
at frequencies much lower than 15,000 cycles, whereas no such 
abrupt cut-offs have yet been recorded at the low end of the ^ 
frequency scale.^ According to this theory of the action of 
the cochlea, it follows that as long as there is sensitivity in 
any of the nerves, even if it is only in a small region, the ear 
will be able to detect a tone of any frequency if it is sufficiently 

^For original data see C E. Lane, Physical Review^ 19, 492, May, 1922, 

2 For example, see paper by Dr. E. P. Fowler and R. L. Wegel, “Audiometric 
Methods and Their Applications,” Transactions oj the American Laryn^olugkaipihino- 
logical and Otohgical Society^ Inc,^ 1922, 



MASKING EFFECTS OF COMPLEX SOUNDS 185 


intense unless the necessary intensity is greater than the person 
can endure. 

A plan view of the basilar membrane is shown drawn to 
scale at the top of the figure. Conjectured contour lines are 
drawn enclosing areas over which the amplitude is more than 
one-half that of their centers. The lengths of these areas are 
obtained from the curves shown in the figure and their widths 
by taking one-half the width of the membrane. 


Maskmg Effects of Complex S ounds 

The masking effects of complex sounds are what one would 
expect after considering the data on pure tones. When the 
masking sound is composed of two components, the masking 
is a combination of the masking produced by each component 
plus the effects due to the summation and difference subjective 
tones- In the previous section such curves were described for 
a complex tone with components having frequencies of 1000, 
1500, and 2000 cycles and having the common sensation level 
of 80 db. 

Masking curves for a musical sound having its components 
60 cycles apart are shown in Figs. 95, 96, 97 and 98. The 
pressure exerted on the ear drum by each component Is given 
by the lower curve on each chart. Figure 95 shows the effect 
when the components of appreciable size are above 3000 cycles; 
Fig. 96 when they are above 1500 cycles; Fig, 97 when they are 
below 1 500 cycles; and Fig. 98 when they are below 500 cycles. 
No attempt was made to explore the detail of the curve near 
each component. 

As stated in the chapter on '‘Noise/’ the threshold shift 
or masking curve of a complex sound may be taken as a meas- 
ure of the annoying effect of the noise. It shows the reduction 
of the capacity of the ear to sense sounds in the presence of 
such a noise. Experimental tests have shown that the thresh- 
old shift of the speech sounds produced by noise is approxi- 
mately the same as the average shifts produced on a 500-, a 
1000-, and a 2000-cycle tone as shown in the noise audiogram. 



Figure 95. 


Figure 96. 



Figure 97, 


Figure 98. 


The masking effect of noise in everyday life Is very great. 
Table XX gives an idea of the noise effects produced in various 
familiar places^ The amount of noise is measured by the 










MASKING EFFECTS OF COMPLEX SOUNDS 187 

threshold shift produced on the 3-A audiometer tone. Such 
shifts are approximately the same as would be produced for 
speech. These figures show how impossible it is to reach an 
audience of any size with an ordinary speaking voice unless the 
noise is kept at a low level. 


TABLE XX 


Amount of Noise 
Threshold Shift 

Maximum Distances for 
Hearing Average Speech 

Typical Place for Such Noise 

0 

1250 feet 

Soundproof booth 

10 

395 feet 

Country residence 

20 1 

12.5 feet 

Quiet office 

30 

40 feet 

Average office 

40 

12 feet 

Noisy office or department store 

50 

4 feet 

Railway train or automobile 

60 

15 inches 

New York subway 

70 

5 inches 


Ho 

1 . 5 inches 

Boiler factory 



CHAPTER V 


Binaural Beats 

When listening to a sound produced in the air normally 
both ears are used and certain sensations are dependent upon 
this fact. Changing the normal condition for such listening 
produces certain effects known as binaural effects. Some of 
these effects have already been mentioned because they were 
directly related to the material being discussed. These and 
other effects are described in this chapter and their bearing on 
the theory of hearing discussed. 

Pure binaural effects are probably produced in the brain 
itself and on account of the difficulty of interpreting the sensa- 
tions produced, there have been very large differences in results 
obtained by different observers. In general, it has been found 
that the two ears aid in locating the direction and distance of 
a sound source. This is particularly tiue of sources giving 
pure tones, for it has been found that persons having a total 
loss of hearing in one ear have great difficulty in locating the 
direction of such tones. Even when using both ears, however, 
it is very difficult to locate the source of pure tones when the 
pitch is high. 

The Effect of Intensity and Phase upon the Apparent Location 
of Sound Images 

If a sound is conducted from a source to the two ears by 
different acoustic paths, some very interesting effects are 
produced. If the apparatus is arranged so that the phase in 
one of the paths can be changed, the following sensations are 
experienced by most persons who try the experiment. When 
the difference in phase between the two paths conducting the 



THE EFFECT OF INTENSITY AND PHASE 189 


iA sound to the ear is zero^ then the source appears to be in the 
median plane, that is, directly in front of the individual. As 
the difference in phase increases, the sound image travels, 
apparently, along a circle toward the ear which is leading in 
phase. When the image appears to be opposite the ear, it 
suddenly seems to jump through the head to the other side, 
finally coming back again to the median plane. 

Several methods of producing the phase difference have 
been used by various investigators. The results obtained by 
G. W. Stewart and his students seem to be more consistent 
than those obtained by any of the other observers. For a source 
of sound he used an instrument called the “Phaser.” ^ This 
instrument consisted essentially of a rotating toothed wheel 
with two telephone bipolar receiver magnets placed radially 
close to the teeth and capable of being separated from each 
other by a variable known number of degrees of rotation of the 
wheel. The currents induced in the windings of these small 
magnets were conducted to two telephone receivers which 
were placed on the ears of the observer. It was thus possible 
to secure any difference in phase between the sounds at the 
two ears by moving the bipolar receiver magnets to different 
positions around the tooth wheel. By including selective net- 
works in the circuit, tones of a fair degree of purity were 
obtained. The observer sat at the center of a circular scale 
and was asked to point in the direction from which the sound 
appeared to come. 

A second method made use of two tuning forks placed at 
the end of rubber tubes, the sound being conducted to the ear 
by means of stethoscopes connected to these tubes. The 
tuning forks were slightly out of tune so that the difference 
in phase would change periodically. Under such circum- 
stances, the image appeared to rotate around the head. By 
recording the time necessary for the image to rotate through a 
given angle, and comparing it with the time to produce a 
complete cycle, results were obtained which showed the relation 


^Physical Review, May, 1920, p. 433. 



SPEECH AND HEARING 


190 

between the phase shift at the ears and the angular displace- 
ment of the sound image. 

In a third method the source of sound was two tuning forks 
which were electrically driven with the same frequency. The 
sound was conducted to the ear by means of stethoscope tubes 
as described above. Means were provided for changing the 
attenuation in the acoustic paths so as to keep the intensity 
at the ear constant while varying the phase. The phase dif- 
ference was produced by choosing different lengths of the air 
path from the tuning forks to the ears. 

In all these methods it was found that within the experi- 
mental error the angular displacement of the image was pro- 
portional to the difference in phase at the two ears, the constant 
of proportionality varying somewhat with frequency. The 
following formula represents the average results obtained by 
such experiments: 

- = 0.0034/ + ,8 (approx.) 

where ^ is the phase difference at the two ears^ 0 the angular 
displacement of the sound image from the median plane^ and 
f the frequency of vibration. For example^ the phase difference 
of a pure tone having a frequency of 500 cycles is always about 
2.6 times the angular displacement of the sound image. When 
the phase difference has reached 180^3 the image has been 
displaced 70°. As the phase difference increases^ the image 
jumps to 70® and then again returns to the median plane 
when the phase difference reaches 360°. For a pure tone of 
60 cycles the phase difference is approximately equal to the 
angular displacement of the image. 

If the phase of sound reaching the two ears is kept constant 
while the intensity is varied, an angular displacement in the 
sound image is also produced. However, the results are very 
much less definite and the variations from one individual to 
another are very great. Stewart concluded from his results 
that for any one individual the angular displacement of the 
image was proportional to the difference in intensity level at 



THE EFFECT OF INTENSITY AND PHASE 191 


the two ears. However^ his results show that for some individ- 
uals the angular displacement for a given difference in intensity 
level was twice as great as for another. As the frequency of 
the exciting tone approaches about 1000 cycles^ the uncertainty 
of locating the image becomes greater^ and for many persons 
there is no localization for the higher frequencies. 

According to the theory outlined in Chapter I, the phase 
difference produced at the two ears is preserved in the com- 
posite nerve current going to the brain. The discharges from 
any particular nerve fibre occur at intervals which are exact 
multiples of the period of the sound wave. As the intensity of 
the sound becomes greater^ this interval becomes less and 
approaches the period of the sound wave. The effect of all 
of the impulses from the individual nerve fibres is to produce 
a stimulation pattern in the brain which has a periodicity of the 
sound wave. The maximum stimulation at the brain centre 
is definitely related to the maximum pressure in the sound 
wave in front of the ear drum. The interval of occurrence 
between the two depends upon the time of transmission of the 
mechanical vibration from the drum of the ear through the 
middle ear and through the cochlea to the basilar membrane, 
and also the time of transmission of the nerve impulses from 
the nerve endings on the basilar membrane to the brain. The 
two maxima produced in the brain from the stimulation coming 
from the two ears will occur at approximately the same time 
if the phase of the sound vibration at the two ears is the same. 
However, when this phase is different there will be a correspond- 
ing time interval between the occurrence of the two maxima 
produced in the brain. It is undoubtedly the recognition of 
this time interval that enables us to recognize phase difference. 

As pointed out by Hartley and Fry,^ when the experiments 
are performed as indicated above, a new experience is produced 
which is different from that produced by a source of sound 
being actually present. If a sound source is placed in the air 
and moved about, the change in phase and intensity produced 
at the two ears must take place together in a certain way so 

^ Physical Review, December, 1921, p. 43 n 


SPEECH AND HEARING 


192 

that any experiments which produce sounds which keep either 
the phase or intensity constant, produce an experience different 
from that ordinarily obtained when listening to actual sound 
sources. The fact that the experiments on the change in the 
apparent image with change in phase give much more definite 
results than those obtained when the phase is kept constant 
and the Intensity varied shows that the mind can more readily 
reconcile differences in intensity than differences in phase. 
Hartley and Fry give a series of calculated curves showing how 
the phase and intensity vary as a sound source is moved about 
the head. It is only when reproduced sounds have phases and 
intensities corresponding to points on these curves that localiza- 
tion as perfect as that obtained from an actual source would be 
expected. 

Binaural Location of Complex Sounds 

During the war the binaural location of complex sounds 
became very important. Its use made it possible to locate 
enemy submarines and aeroplanes. It was found that when 
two transmitters connected separately to two receivers were 
used in picking up the sound and transmitting it to the two 
ears, the direction of a complex source of sound could be 
located by the individual listening. To obtain a complete 
duplication of the binaural effect produced without such a 
transmission system, the two transmitters must be mounted on 
something equivalent acoustically to the head and at positions 
corresponding to the two ears. Then if the transmission sys- 
tem transmits faithfully the phases and amplitudes of the com- 
ponent sounds to the ears, the same auditory sensation will be 
produced as that obtained when the head is placed in the 
position where the artificial head carrying the transmitters is 
located. Any variation from this ideal transmission system 
will result in producing results which are different from those 
ordinarily produced by direct listening. 

Experiments have shown that a considerable departure from 
this ideal may be made and yet a fairly good sense of localiza- 



BINAURAL BEATS 


193 


tion be obtained. As was the case with pure tones^ so with 
complex tones it was found that the phase was the controlling 
factor. For this reason, phase compensators were introduced 
into such a binaural transmission system so as to bring the 
apparent location of the sound directly in front of the observer. 
The amount of compensation necessary to do this indicated the 
position of the complex sound. The very fact that such com- 
pensators will not produce the proper shifts for all the com- 
ponents, indicates that the localization of such a compensated 
transmission system will be somewhat indefinite. When the 
source of sound is a combination of a few pure tones the com- 
pensated transmission system might very well produce a shift- 
ing of these components to different apparent positions with 
the result that either images in several directions are formed 
or confusion produced so that no localization is obtained. The 
former effect has been observed and described by Bowlker. 

The location of a complex sound under ordinary conditions 
is very definite and may be made even by a person who is totally 
deaf in one ear. It is difficult to point out definitely all the 
factors which contribute to this ability. Certainly the reflec- 
tions in the room under ordinary circumstances give consider- 
able aid. In any event, the phases and amplitudes of the com- 
ponents must change in a prescribed way. Our experience 
with such sounds has given us an education so that we uncon- 
sciously know the way these changes take place as the source is 
moved into different positions. According to this view, it 
seems reasonable to expect that new complex sounds would be 
very much more difficult to locate than those with which we 
are ordinarily familiar. 

Binaural Beats 

If by means of two telephone receivers connected to vacuum 
tube oscillators tones of slightly different pitch are introduced 
into each ear, the sensations described below will be produced. 
Let us assume that one tone, called the ^'primary,” is kept at 
a sensation level of 80 db in the right ear. When the sensation 



194 


SPEECH AND HEARING 


level of a second tone, called the “secondary,” is at about lo 
db in the left ear, then faint beats are observed. As the sensa- 
tion level of the secondary is increased these beats become 
more pronounced, reaching a maximum at a level of 30 db. 
For higher levels the beats again become fainter until a level of 
45 db is obtained, above which they are not heard. These beats, 
called “objective” beats, produce sensations which correspond 
in every way to those produced when the primary tone is 
reduced in sensation units about 50 db and introduced directly 
into the same ear as the secondary. These objective beats 
are undoubtedly due to physical interference in the left ear, 
the vibrations coming from the right ear by means of bone 
conduction. This conclusion is confirmed by the fact that the 
difference in intensity level for best beats increases when soft 
rubber pieces are placed under the receiver caps. It is impor- 
tant to remark here that this phenomenon is one which can 
easily be observed by anyone who tries it and beats are pro- 
duced for all frequencies. 

Keeping the primary again at a sensation level of about 
80 db in the right ear it will be found that for a sensation level 
of 60 db for the secondary tone in the left ear beats again 
appear having a maximum effect when the tones in both ears 
have the same sensation level. These beats called “subjec- 
tive” beats disappear when the secondary tone is at a level of 
100. These subjective beats are entirely different in character 
from those called objective beats. Some persons cannot hear 
them at all, and others report results which are quite discordant. 
It is with these subjective beats that most of the experiments 
on binaural beats during the past century have been con- 
cerned. For this reason it is not surprising that there was so 
much disagreement between results reported. In a good many 
of these experiments, tuning forks or other sources of sound 
were used which made it difficult to prevent the sound from 
one source going into both ears and thus directly producing 
beats of the ordinary kind. 

The work of Stewart ^ showed that seventeen out of the 

^Physical Review, June, 1917, p. 502. 



BINAURAL BEATS 


^95 


twenty-three were able to hear beats sufficiently distinct to 
report them. The work of Lane ^ showed that eighteen out 
of twenty-two were able to hear the beats sufficiently well to 
determine their period. The kinds of sensation reported by 
the different observers were greatly different. 

A summary of the essential facts concerning subjective beats 
as given by Lane is as follows: 

‘‘(a) If two tones of equal intensities and nearly the same fre- 
quencies are simultaneously presented to opposite ears^ the beat 
frequency can be recognized by about 8o per cent of the observers, 
provided the frequencies of the beating tones are less than 800 or 
1000 cycles. For higher frequencies the beats cannot be heard. 

If the beats are slow, the one outstanding phenomenon 
observed by all who recognized the beat is an alternate left and 
right localization of the sound, localization being on the side of the 
tone leading in phase. 

‘'(r) Most observers who hear the slow beats experience a more 
or less vague notion of the localization travelling along some path 
through the median plane when the localization shifts from one side 
of the head to the other, but there is no good agreement among the 
observers as to the position of this path. 

‘'(d) The passing of the localization through the median plane 
is generally mure clearly defined during phase agreement than during 
phase opposition. 

“(e) While all observers who heard the slow beats reported 
without any previous suggestions the existence of the alternate right 
and left localization, none reported any intensity maxima until 
questioned as to the existence of such maxima. However, when 
questioned, over 80 per cent of the observers reported maxima 
corresponding to one or more of the following three-phase relations: 
(i) phase agreement, (2) 30° or 40° before opposition and (3) 30° 
or 40*^ after opposition. There was no good agreement among the 
observers and several during the course of the experiments shifted 
from one phase relation to another in their I’eport on the time of 
intensity maxima. 

'*(/) For fast beats the chief sensation is that of an intensity 
fluctuation of the sound located somewhere within the head. 

For intermediate beats some reported a predominating 
sensation of motion, others of intensity fluctuation and still others 
seemed to experience both sensations about equally well and could 
direct their attention upon either at the sacrifice of the other. 

^ Physical Review y September, 1925. 


196 


SPEECH AND HEARING 


“ {K) Subjective beats are heard equally well for tones introduced 
into the ears by means of telephone receivers with and without 
receiver cushions or presented by means of rubber tubes. 

“ (r) So long as the two tones are of equal intensity, the hearing 
of subjective beats may be heard about equally well for all intensity 
levels of the two tones.” 

It is important to note that while the objective beats are 
readily observed by all, the phenomenon of subjective beats is 
quite indefinite and differs very greatly with individuals; also, 
that no subjective binaural beats are obtained for frequencies 
higher than 700 or 800 cycles. 

It seems clear that the phenomenon of subjective beats is 
one which is produced in the brain and is closely associated 
with the binaural localization described in the last section. 
The term “beat” is hardly descriptive of the phenomenon 
since the sensation obtained is that of a wandering localization. 
It is only occasionally that we notice the intensity maxima and 
they are usually associated with positions of best localizations 
rather than positions of maximum loudness. Also, since the 
intensity and phase relations do not correspond to those ori- 
narily experienced in locating a source of sound, the psycholog- 
ical reaction of the observers must be that of experiencing a 
new sensation. 

Other Binaural Phenomena 

The following experiment suggested by Dr. H, D, Arnold 
showed some very interesting effects. A high quality trans- 
mission system was provided with a filter system so that all 
the frequency components below 1000 cycles were sent into 
one channel and delivered to the left ear. Those above 1000 
cycles were sent into another channel and delivered to the 
right ear. When speech was transmitted over such a system 
there was apparently no distortion produced, although if either 
one or the other of the two receivers were taken away the 
speech sound was very distorted and it was hard to recognize 
what was being said. When both receivers were used, the 
speech seemed to be good quality and no difficulty was experi- 


OTHER BINAURAL PHENOMENA 


197 


enced in following what was being said. Apparently^ in this 
case^ the brain was able to combine the sounds obtained from 
the two ears to complete the proper picture. However^ when 
music was transmitted, a different situation resulted. This 
was particularly true when listening to music from the piano. 
In this case the tones appear first in one ear and then in the 
other ear depending upon the pitch. This causes confusion 
and gives a very weird sort of sensation. When listening to 
sounds which have frequencies fairly well scattered in both the 
ranges below and above 1000 cycles, the sensation produced 
was about the same as that obtained by combining the fre- 
quencies into the same ear. When the sounds were predominat- 
ing in either one or the other ear, localizations were produced 
first at one ear and then at the other as described above. 



CHAPTER VI 


Methods of Testing the Acuity of Hearing 

The kinds of hearing tests that are needed may be classified 
into four groups according to their purpose as follows: 

1. Industrial, or those made to determine the fitness of 

a candidate for employment. In certain types of 
work it is particularly important that a prospective 
employee meet a definite requirement for acuity of 
hearing. Tests made in the army and the navy for 
various branches of service are conspicuous examples 
of this kind of test. 

2. Educational, or those made to determine the degree of 

hearing of school children in both public schools and 
schools for the deaf to determine the proper educa- 
tional methods. 

3. Clinical, or those made to assist the physician to make a 

proper diagnosis of the cause of deafness. 

4. Research, or those made to determine new facts about 

both normal and abnormal hearing. 

Different methods are appropriate for these different pur- 
poses. Unless the particular purpose is kept in mind, there is 
apt to be confusion when discussing the merits of a methcid of 
testing hearing. 

It is highly desirable that a single scale, Independent of the 
method of testing and of general application to all the purpo.ses 
mentioned above, be used for representing the degree of hearing. 
Therefore, before describing the apparatus developed for the.se 
various purposes, it is well to describe such a scale of hearing 
which is coming into use by otologists and which may be u.sed 
for all of these purposes. It will then be shown how the results 

iq8 



METHODS OF TESTING ACUITY OF BMRIftGl 5^^ ^ 

obtained from the commonly made voice, watch tick, a^ceume- 

ter, coin click, and tuning fork tests can be expressed in%6^€g' ' 
loss units on this scale. 

The sensation units used in describing the intensity levels 
of sounds are convenient and logical for defining the degree of 
deafness. The hearing loss is measured by the threshold shift 
from the average threshold intensity level for normal ears. In 
other words, the sensation level of a tone which can just be 
heard by the person being tested is the hearing loss (H.L.) 
expressed in sensation units. Expressed mathematically, 

H.L. = lo log ^ (i) 

■Iq 

where I is the necessary intensity of the sound for hearing by 
the person being tested and io the normal threshold intensity. 
This equation gives the hearing loss for any sound — ^pure tones, 
musical sounds, watch tick, voice, tuning forks, etc. 

In a paper by Fowler and Wegel,^ a hearing scale was pro- 
posed which has been objected to by some otologists because 
it is dependent upon the threshold of feeling as well as the 
threshold of hearing. On this scale the per cent hearing loss is 
the hearing loss in sensation units divided by the number of 
sensation units between the threshold of hearing and the 
threshold of feeling for an average normal ear. It is undoubt- 
edly the best answer to the practical question as to what is the 
per cent hearing loss, and is very useful In expressing general 
results. 

It is sometimes convenient to give a figure which represents 
the average per cent loss of hearing. It seems reasonable and 
logical to choose for this the fractional part of the normal 
auditory sensation area corresponding to tones which cannot 
be properly sensed by the person being tested. This Is approx- 
inately ^ equivalent to the ratio of the number of distinguish- 


1 ^'Audiometric Methods and Their Applications/’ published in Transactions of the 
American Laryngological^ Rhinological^ and Otological Society ^ Inc.^ 1922. 

2 This would be exactly equivalent if the values of Aa and AP in Fig. 83 were equal 
for all positions within the inclosed area of this figure. 



200 


SPEECH AND HEARING 


able tones in the hearing range of the person being tested to 
the number of distinguishable tones In the normal hearing range. 

The charts shown in Fig. 99 illustrate the meaning of this 
definition of average hearing loss. The charts give the results 




Z\z 




□ 

;> 

1 . 



n 

i 

> i'C 

^ 4r: ■ — 

|_ 

■ ■ - 

11#, 

_i 

1 

1 

• ! 

■" i 

loal — 

^ 1 1 1 1 

ri TT 1 . 

J 


,600* 400 200 0 200 400 . €00 

PiTCH 

E-hearing L0ss®aa’% 




D. HEARING LOSS =42% 



Fig. 99. — ^Audiograms for Typical Cases of Deafness. 


for six cases of deafness. The line separating the light from 
the dark areas is the threshold curve for the deafened person. 

' It is frequently more useful to the deafened person to know 
the per cent hearing loss for speech rather than the average per 
cent loss which is obtained in the manner just described. This 
is readily done if speech sounds are used in making the test. 


CONVERSION OF HEARING LOSS 


201 


As is shown later, it may be calculated from the audiogram. 
For most purposes it is desirable to express the hearing loss in 
sensation units rather than per cent hearing loss. 

Conversion of Hearing Loss in Sensation Units to Per Cent 
Hearing Loss 

The hearing loss expressed in sensation units can be con- 
verted into per cent hearing loss by multiplying the former by 
a factor K which varies with the type of sound used in the test. 
It is obvious from the definition given above that this factor is 
equal to loo divided by the number of sensation units between 
the normal threshold of hearing and the normal threshold of 
^ feeling for the particular sound used. Experiments made in 
Bell Telephone Laboratories have indicated that for speech 
this factor is .83; for the test tone of the 3-A audiometer it is 
equal to i.o; for a watch tick, coin click, or acoumeter it is 
approximately 1.5. For pure tones the value of K is dependent 
upon the pitch and is given in Table XXL The question 
marks indicate that the values of K for tones having either 32, 
48, or 12,800 cycles per second have not been accurately 
determined. 


TABLE XXI 

Factors for Converting Hearing Loss Expressed in Sensation Units into 

Per Cent Hearing Loss 


Vibration 

F requency 

K 

Vibration 

Frequency 

K 

3 ^ 

3’3 (?) 

800 

•77 

48 

2.2 (?) 

1,024- 

.76 

64 


1,600 

•77 

100 

1.28 

2,048 

•77 

laS 

1 .09 

3,200 

.81 

200 

,96 

4,096 

.86 

256 

.91 

6,400 

•94 

4CX> 

•83 

8,192 

1.22 

512 

■79 

12,800 

1.42 (?) 



202 


SPEECH AND HEARING 


Relation between Sensation Units Hearing Loss and the Maxi- 
mum Hearing Distances for Speech 

Let us now consider the relation between the hearing loss 
expressed in these sensation units and that expressed in the 
usual voice test method. In this latter method the tester pro- 
nounces words while slowly moving away until the patient just 
fails to interpret them. The degree of hearing obtained from 
such tests usually is expressed by a ratio of two numbers. For 
example, tV; the numerator is the distance that the patient 
can hear and the denominator the distance that a person with 
normal hearing can hear. If the room in which such tests are 
made is acoustically treated so that no sounds are reflected 
from the walls, ceiling, or floor, the intensity of the sound 
decreases as the inverse square of the distance. In a room 
with numerous drapings and a heavy carpet, ^his condition is 
approximately fulfilled. Laboratory tests have indicated that 
for an average speaker calling numbers with an intensity 
corresponding to the musical notation mf (mezzoforte) and 
having the lips at a distance of if inches from the ear, the 
intensity is 90 db above the normal threshold of audibility 
or under such conditions the speech has a sensation level of 
90 db. These tests have shown also that the intensity is 
80 db above the intensity necessary for a 50 per cent correct 
interpretation of called numbers. For a voice corresponding 
to pp (pianissimo) the sensation level is about 15 decibels lower 
and for a voice corresponding to jff (fortissimo), about 15 decibels 
higher. Approximately the same results are obtained when 
using either a pp voice or a loud whisper. An average whisper 
is about 15 db lower than a loud whisper. 

Using these data, the inverse square law, and the definition 
of hearing loss given above, it is possible to calculate for each 
intensity of calling the maximum distance for hearing and 
interpreting called numbers by a person of known hearing loss. 
The necessary equations for doing this are developed in 
Appendix D. In this appendix it is shown also that the 
maximum distance at which the normal ear can interpret called 



SENSATION AND MAXIMUM HEARING 


203 


numbers is 40 feet for the average whisper, 222 feet for the loud 
whisper or pp voice, 1250 feet for the mf voice, and i| miles 
for the jf voice. At first thought, these numbers seem to be 
unreasonably large to conform with our every-day experience, 
but it must be remembered that we are usually immersed in a 
continuous noise.. This is especially true in the large cities. 
One needs only to recall the common experience while in the 
country on an early morning of hearing the cock crow and 
other familiar sounds at a distant farmhouse to realize that 
the voice will reach these long distances in a very quiet place. 

As mentioned before, the noise in the usual city office is 
sufficient to shift the threshold of hearing 20 or 30 sensation 
units without causing any annoyance. The hearing distances 
corresponding to the four intensities of the voice which would 
be obtained in such an office might be as small as 15 inches, 
7 feet, 40 feet, and 222 feet instead of the larger distances given 
above. It is seen, therefore, that a speech test made in such 
an office would not differentiate between ears that are normal 
and those having a 30 db loss in hearing. The importance of 
making hearing tests in soundproof booths is thus evident. 

The maximum distances at which persons having different 
amounts of hearing loss can interpret these four intensities of 
voice were similarly calculated and are given in Table XXII. 

It is generally recognized that it is almost impossible to 
obtain accurate results by means of the usual speech methods 
because of the uncertainties of controlling the intensity of the 
voice, and the noise and acoustic conditions of the testing 
room. However, for rough work these tests are useful and 
this table should aid in the interpretation of the results obtained 
from such tests. It is evident that there are two ways of 
varying the intensity of the speech sounds arriving at the ear 
of the patient, namely, by varying the intensity of calling and 
using a fixed distance, or by varying the distance and using a 
constant intensity of calling. It is seen from Table XXII that 
when the first method is used, if the tester calls at a distance 
of 15 inches and the patient cannot interpret an average 
whisper, his hearing loss is greater than 30 db; if he cannot 



204 


SPEECH AND HEARING 


interpret a loud whisper or pp voice, it is greater than 45 db; 
if he cannot interpret a mf voice, It is greater than 60 db; and 
finally. If he cannot interpret a ff voice (shouting intensity) 
it is greater than 75 db. It Is thus seen that a rough measure- 


TABLE XXII 


Maximum Distances iNT A Quiet Place Free from Reflections for Interpreting 
Called Numbers by Persons Having Various Amounts of Hearing Loss 


Hearing Loss 
Sensation 
Units 

Average 

Whisper 

Loud Whisper 
Qx Voice 

mj Voice 

f Voice 

0 

39.5 feet 

222 feet 

1250 feet 

1 3 miles 

5 

22 , 2 feet 

125 feet 

704 feet 

3950 feet 

10 

12.5 feet 

70 feet 

395 feet 

2220 feet 

15 

7.0 feet 

39.5 feet 

222 feet 

1250 feet 

20 

4.0 feet 

22.2 feet 

125 feet 

704 feet 

^5 

2.2 feet 

12.5 feet 

70 feet 

395 feet 

30 

15 inches 

7.0 feet 

39.5 feet 

222 feet 

35 

8 , 5 inches 

4.0 feet 

22.2 feet 

125 feet 

40 

4.7 inches 

2.2 feet 

12.5 feet 

70 feet 

45 

2 . 7 inches 

15 inches 

7.0 feet 

39.5 feet 

50 

1 . 5 inches 

8.5 inches 

4.0 feet 

22.2 feet 

55 

0 . 8 inch 

4.7 inches 

2.2 feet 

12.5 feet 

60 


2.7 inches 

15 inches 

7.0 feet 

65 


i . 5 inches 

8 , 5 inches 

4.0 feet 

70 


0,8 inch 

4.7 inches 

2.2 feet 

75 



2.7 inches 

15 inches 

80 



I t inrhpQ 


85 



0 . 8 inch 

0 . ^ liiLJlCo 

4.7 inches 

90 




0 ^ inr'nAQ 

95 




« j 

100 




Q 1 ti aK 

no 


May be reached \ 

)y speaking tube 

w 1 0 incii 

115 





120 


Totally deaf 




ment of the hearing loss in sensation units is obtained by vary- 
ing the intensity of calling at a given distance. 

Probably more accurate results can be obtained by varying 


SENSATION AND MAXIMUM HEARING 


205 


the distance^ using a constant intensity of voice. For distances 
smaller than 6 inches the results will be unreliable due to the 
uncertain reflections between the head of the listener and the 
head of the speaker and the difficulty of making accurate 
measurements of the distance. This limits the maximum loss 
of hearing that can be measured by this method. The noise 
conditions and extreme distances limit the minimum loss that 
can be measured. For an average whisper this practical 
range is from a hearing loss of 40 to 20; for the loud whisper 
or ;pp voice from 55 to 20; for the mf voice from 70 to 30; 
for the voice from 85 to 45; the noise conditions limiting the 
lower value in the first two cases and the size of the room 
in the last two. When a greater accuracy than 10 or 15 db 
is desired it is necessary to calibrate the voice rather than 
rely upon the tester’s judgment as to which intensity of voice 
he is using. This is done by determining the normal hearing 
distance or its equivalent for the particular intensity used. 
If do is such a distance and d the distance that the patient can 
hear the same voice^ then the H.L, (hearing loss) is given by 

H.L. = 10 log = 20 (log do- - log d). (2) 

Since the normal hearing distances are so large^ indirect means 
must be used in their determination except for intensities 
smaller than the average whisper. By combining both 
methods a fairly large range of hearmg can be covered. It 
depends, however, upon the ability of the tester to change 
accurately the intensity of calling from one type to the other, 
an accomplishment that is seldom realized. 

When this method is used by a tester who is somewhat over- 
anxious that the patient show an improvement in hearing it 
might give quite erroneous results. If he should use the 
pp voice and find the hearing distance i foot before treatment, 
and then In the test after treatment, in his anxiety, raise the 
intensity of his voice to that corresponding to mf^ he would 
find the hearing distance had Increased to 5 feet, even though 
there were no change in the patient’s ability to hear. The 



206 


SPEECH AND HEARING 


increase from I to 5 feet would sound like a big improvement 
to the patient. 

As shown in Appendix D, an improvement in hearing for 
the same patient of i foot for an average whisper is the same as 
an improvement of feet for the pp voice, J 2 feet for the mf 
voice, and ig^feet for the ff voice. Equation (4) of Appendix D 
shows that if the hearing distance is expressed as a fraction of 
the normal distance this fraction will be the same for all inten- 
sities of voice used. This shows that the common practice 
of giving results of voice tests as fractions of the normal 
distance has a logical basis. As will be seen later, there is no 
such logical reason for expressing the results of tuning-fork 
tests as fractions of the normal time. As indicated in Table 
XXII, the intensity range of hearing speech is 120 db, corre- 
sponding to a distance variation of 1,000,000 to I. The range 
from o to 60 db loss or 50 per cent loss for speech corresponds 
to a distance variation of 1000 to i. This emphasizes the fact 
that variation of distance alone is entirely inadequate to cover 
the complete intensity range for hearing. 


Reduction of Watch Tick, Acoumeter, and Coin Click Tests to 
Sensation Units Hearing Loss 

The watch-tick test is probably more familiar to the 
average person than any other hearing test, because it is so 
commonly used in physical examinations. The distance for 
hearing is determined in a similar way to that used in the 
speech test and the results are recorded similarly as a ratio. 
These results can be reduced to sensation units hearing loss 
by the method outlined for voice tests. 

In this case, however, reflections from objects in the room, 
especially from the head of the patient, produce very marked 
irregularities. This is due to the fact that the components of 
sound from the watch are mostly in the high frequency region 
around 2000 cycles per second. At such high frequencies 
more definite reflection patterns are produced by the objects in 
the room than for the low frequencies. The dotted line in 



REDUCTION OF WATCH TICK aoy 

Fig. loo shows a reduction curve which was determined 
experimentally for a watch tick and illustrates very well the 
irregularities which commonly exist in a room even when well 
damped. The solid curve in this figure is based on the inverse 
square law. This curve is drawn from equation (3) of Appen- 
dix Dj the abscissas representing values of 20 log the value 
of yo being 20 log It is thus seen that if do, the normal 
hearing distance for the kind of sound used in the test, is 
determined, the value of yo can be read directly from the 
curve. From the distance d that a patient hears the same 
sound, the value of y is likewise obtained. If the values of 



Fro, 100. — Chart for Reducing Results of Speech, Watch Tick and Acoumeter 
Tests to Sensation Units of Hearing Loss. 


y and yo thus obtained are on the same side of the zero line, 
their difference gives the hearing loss; if on opposite sides 
their sum gives the hearing loss. For example, the distance 
do for a Bristol watch is about 6 feet, corresponding to a value of 
yo equal to 15.5. If a patient can just hear this watch at a 
distance of 4 inches corresponding to a value of y equal to 9.5, 
then the hearing loss for the watch tick is 25 db. 

In Fig. loi are shown the relative amplitudes of the com- 
ponent frequencies in a watch-tick sound. From this it will be 
seen that the watch tick is essentially a test of acuity for fre- 



SPEECH AND HEARING 


208 

quencies in the neighborhood of 2000 cycles per second. The 
sensation levels of the sound from several types of watches 
when held in contact with the ear were determined to be in a 
range from 40 to 70 db. An average for good grade watches 



Fig. ioi. — FRE dUENcy-AwPLiTUDE Acoustic Spectrum for Bristol Watch 

Tick. 

is 45 sensation units. This means that a patient who can just 
hear such a watch tick has a 45 db hearing loss for frequencies 
near 2000 cycles per second. 

The results of acoumeter or coin click tests can be reduced 
in the same way. They are open to the same objections as 
mentioned for the watch tick. This test is also essentially a 
test for high frequencies. 

Reduction of Results of Tuningfork Tests to Sensation Units 
Hearing Loss 

In the standard method for determining the acuity of hear- 
ing at different pitches, use has been made of a series of tuning 
forks having various vibration rates through the audible range. 




REDUCTION OF RESULTS OF TUNING-FORK 209 


, The fork is given a standard blow of some sort to set it into 
vibration. It is then held as close to the ear as possible, 
preferably with the flat part of the prong directly facing the 
auditory meatus. The time t in seconds from the striking of 
the blow until the patient no longer hears the sound is observed. 
A comparison of this time to that h required for normal hearing 
gives a measure of the hearing by air conduction. It is well 
known in dynamics that the time difference /q — / is propor- 
tional to the logarithm of the ratio of sound intensities created 
corresponding to each time. Since the hearing loss expressed 
in sensation units is also proportional to the logarithm of this 
ratio, it follows that 

H.L. = A(;fo - i) (3) 

where A is the constant of proportionality. The constant is 
dependent upon the damping of the tuning fork and is the 
change per second in the intensity level of the sound produced. 
Any experimental method which will measure this rate of 
change will be suitable for determining the constant A. It 
is seen that the hearing loss is obtained by multiplying the 
decrease in time for hearing the fork by a constant of the fork. 
If it is desired to reduce the results to per cent hearing loss, 
the following relation is used, 

per cent hearing loss = K’A(^o — /), (4) 

the factors K and A having definite numerical values for each 
fork used. For illustrating the kind of constants one might 
expect to obtain, the values of A for three groups of tuning 
^ forks are given in Table XXIII. For convenience, the values 
of K from Table XXI are also given. The first group of foi^s 
is from a set used in Bell Telephone Laboratories. Tne 
second set is used by Dr. E. P. Fowler. The last one is [a 
500-cycle standard fork used by Dr. Douglas Macfarlan. For 
example, when Dr. Fowler’s 256 fork is used, the per cent 
hearing loss is given by 

per cent hearing loss = 1.25 (70 — /), (5}) 

where t is the air conduction time for the patient. It is evident 


A 


SPEECH AND HEARING 


aio 

from the formula that a person having more than 87I per cent 
loss will not hear the fork at all. 

TABLE XXIII 


Typical Constants for. Tuning Forks 


Rate in dv 

Damping 
Constant A 

Time /o in Seconds 
for Normal Air 
Conduction 

Factor K for Reduc- 
ing Hearing Loss to 
Per Cent Hearing 


Bell Telephone Laboratory Forks 


24 

•30 

75 

4. (?) 

48 

1.9 

51 

1.1 (?) 

64 

1 .61 

41 

1-5 

100 

I -75 

30 

1.28 

200 

•93 

no 

.96 

400 

.46 

140 

•83 

500 

•59 

135 

,81 

800 

.87 

112 

•77 

lOCXD 

1. 19 

71 

.76 

1200 

1. 14 

69 

.76 

1800 

2.29 

44 

-77 

2000 

2.41 

45 

•77 


Dr. E. P. Fowler’s Forks 


128 

1 .08 

65 

1 .09 

256 

1.38 

70 

.91 

S12 

I-3I 

95 

•79 

1024 

1.70 

40 

.76 

2048 

2.17 

20 

•77 

Dr. D. Macfarlan’s Standard 500 Cycle Fork 

500 

2 

1 55 

.81 


Hearing loss == A (/q — t) 

Per cent hearing loss = XA (^o — /) 


It is useful to notice that if the hearing loss of the tester is 
known, the hearing loss of the patient can be found as follows : 
Set the fork into vibration by any means, not necessarily using 
a standard blow. Hold the fork to the patient’s ear in the 
standard fashion. Start the stop watch when the patient 


REDUCTION OF RESULTS OF TUNING-FORK 21 1 


signals he no longer hears the tone. Hold the fork to your 
own ear in the same standard way. Stop the watch when 
you cease to hear the tone. Then the reading of the watch 
in seconds, multiplied by the constant of the fork A, gives the 
difference in db between your hearing loss and that of the 
patient. 

This method has the advantage that the results are not 
dependent upon the initial blow given to the fork, but it has 
the disadvantage that it depends upon the hearing of the 
tester, and any noise in the room affects the hearing of the 
tester much more than that of the patient, provided the 
former is much more acute than the latter. 

For example, suppose that you know that the tester’s 
hearing loss Is 20 db at 1024 cycles per second, and also, when 
using the technic described above and Dr. Fowler’s 1024 fork, 
the time difference is found to be 25 seconds. Then the hearing 
loss of the patient Is 

1.70 X 25 - 1 - 20 = 62.5 
or his per cent hearing loss is 

62.5 X .76 = 47.5 per cent. 

It would be a great advantage to those who use tuning forks 
if the makers of such forks for otologic purposes would furnish 
values of the damping constants as well as the frequency of 
vibration. Then the results of tuning-fork tests could be 
reduced to a common basis, as Indicated above. 

In order to increase the speed and accuracy of making 
hearing tests which are equivalent to those which have been 
described, new types of measuring instruments have been 
developed which are called audiometers. For making speech 
tests, a phonograph type of audiometer has been developed. 
In this instrument records which have been made by special 
electrical processes are used for reproducing the speech sounds. 
This insures a definite volume of voice, so that accurate com- 
parisons are possible. The recorded speech sounds are trans- 
formed Into their electrical equivalents by means of an electro- 



SPEECH AND HEARING 


2.12 


are 


magnetic reproducer. The electrical waves thus created 
carried by means of wires to a receiver which is held to the 
gatipnt’s ear. The intensity of the sound sent into the patient’s 
I pQjQj-j-Qjjgj electrical resistances in the electrical cir- 
jThis phonograph type audiometer is now available in 
indifferent forms, according to the purpose for which it is 
to be used. For making a survey of the hearing of a large 




Figure 102. 

group to find those who have a deficiency for hearing speech, 
it is furnished with the turntable unit, special records upon 
which thb recorded speech sounds continually decrease in 
intensity as the record revolves, and multiple units containing 
8 receivers per unit. A master sheet is furnished, which gives 
the hearing loss corresponding to the intensity of each of the 
numbers on the record. In this form no batteries or attenua- 
tors are required. An instrument of this sort was tried out 
at one of the public schools in New York City, and it was 
found that with a single phonograph unit and 5 multiple units 
that is, 40 receivers, the pupils could be tested at the rate of 


REDUCTION OF RESULTS OF TUNING-FORK 113 

about 100 per hour. The entire school of about 1000 pupils 
was tested in three days, and a number determined for each 
pupil^ which represented his hearing ability for speech^. 

Fig. 102 a picture of this instrument in operation in a 
room is shown. Through the efforts of the American 
tion of Organizations for the Hard of Hearing about 
school children had their hearing tested with instruments of 
this type last year (1927). These tests indicated that from 
8 to 12 per cent of the children have defective hearing. It was 
through the cooperative efforts of this organization and Bell 
Telephone Laboratories that this instrument was developed. 

In Fig. 103 a close-up view of the turntable unit and the 
tray unit is shown. For purposes of testing more accurately 
the hearing of one individual at a time^ an attenuator is inter- 
posed between the turntable unit and the receiver^ and a record 
is used which gives a constant speech level. For use in schools 
for the deaf or for testing patients who are very hard of hearing, 
an amplifier unit is interposed between the turntable unit and 
the attenuator box. By means of the attenuator box, the 



Figure 103. 

speech intensities being delivered to the ear of the patient can 
be varied 10 billion-fold by turning the dial from one end of the 
scale to the other, this intensity variation corresponding to a 
distance variation of more than 100,000. This makes it 




SPEECH AND HEARING 


!2I4 

possible to test all degrees of hearing from normal hearing to 
total deafness. Not only does it have this increased range over 
that obtainable in the ordinary speech test, but it ^ has a 
definite intensity level for the speech and is not subject to 
variations in the acoustic properties of the room. 

The 3-A audiometer was developed to take the place of 
such tests as the watch-tick tests, the acoumeter tests, or those 
designed to make a very quick test of the general hearing level. 

In this instrument a tone having components throughout the 
entire speech frequency range is electrically generated and 
delivered to a receiver to be held on the ear of the patient. 

The volume of the tone is controlled by the same attenuator 
unit used in the other audiometers. It reads directly, either 
in db loss or per cent hearing loss, since it was found that ’ 
for this tone there were approximately loo db between the 
threshold of hearing and the threshold of feeling for the 
normal ear. A picture of this instrument as used for measur- 
ing noise was shown in Fig. 58. When used for measuring 
hearing the offset receiver shown is replaced by the usual 
head receiver. This instrument has been found to be particu- 
larly useful in schools for the deaf. It enables the teachers to 
grade the degree of hearing of the child very quickly, and thus 
aids them in deciding upon the kind of methods to be used in 
teaching him. It is also useful in making a quick test of the 
hearing of large groups when they are tested one at a time. 

This 3-A audiometer is also useful when making tests of 
bilateral deafness, for it can be used in place of the Barany 
noise apparatus. For this purpose it is superior to the latter ^ 
device, because the volume can be placed at exactly the proper 
level to sufficiently mask the hearing in the good ear without 
in any way interfering with the hearing in the bad ear. The 
proper level for doing this depends upon the relative difference 
in the hearing of the two ears. 

A modification of this instrument so that the generator 
derives its power directly from the alternating current usually 
supplied for lighting purposes has recently been made. This 
modified instrument is called the 5-~A audiometer. A set 


REDUCTION OF RESULTS OF TUNING-FORK 215 


"V similar to this buzzer-type instrumentj but having a very 
limited intensity range, was developed some time ago by 
Seashore. It has been used mainly by psychologists in testing 
variations in the acuity of hearing of persons having no notice- 
able defect In the hearing mechanism. 



Figure 104. 


For replacing the tuning-fork test several types of audiom- 
eters have been developed by Seashore, Dean and Bunch, 
Knudsen and Jones, Kranz, Bell Telephone Laboratories, and 
others. These Instruments consist essentially of a generator 
for producing alternating currents of various frecjuencies, an 




2i6 


SPEECH AND HEARING 


electrical attenuator for varying the intensity of these currents^ 
and telephone receivers for converting them into sound. The 
two which were developed by Bell Telephone Laboratories and 
known as the i-A and the 2-A audiometers are shown in Figs. 
104 and 105. The generator for these audiometers consists of 
a special vacuum tube oscillator designed especially for produc- 
ing pure tones. The 2'~A audiometer has been designed for 
general practice, and for this reason great stress was placed 





REDUCTION OF RESULTS OF TUNING-FORK 217 


Audiogram blanks are furnished with the instrument^, which 
makes it possible to show graphically the hearing loss at each 
frequency. 



AUDIOGRAM OF 

PATIENT- i audiogram OF PATIENT -2 

NORMAL^O 

HEARING 

SPttCHS 





kl. 


1 

NORMAL 











r— 































hearing _ ° 

!i« 

SPEi 

CHS 









3 -A 

* ^^0 















— 4 




AUDI 

1 







WATCH 

ICK 


































WA 

CH TICK' 





^ ^ 60 








































5 ^60 

'-100 














i 500 

^lOO 





































































































lf> 32 M 128 2S6 5i2 lO^ 
FREQUENCY 

AUDIOGRAM 0 

4 2048 4096 6192 

PATIENT -3 

-20 

NORMAL . - 
HEAftlNG"^® 

2^40 

ii“ 

xgeo 

*^100 

16 32 64 120 256 512 1024 2048 4096 6192 

FREQUENCY 

AUDIOGRAM OF PATtENT-4 

NORMAL ^0 
HEARING 














r— 










^ 

























m — 
































WATCH TICK* 









w 

ATCH ri( 




— 











C5J 















— j — 

- 



“ 



•v.. 




1 . 




* 



"" 






-•e-A 
















— 



f. w 



r"'' 




3 -A 











AUO 









AUDIOMETER 
















o 






















. 



















k 




























r 




















-30 

normal . 

10 3 

64 I2B 256 5l2 1024 2046 40 
FREQUENCY 

auDiogram of pat IE 

96 8192 ■ 

NT-5 

-20 

NORMAL , Q 
HEARING^ 

■ 16 32 64 128' 256 512 1024 2fl 

FREQUENCY 

AUDIOGRAM OF PAT 

48 4096 6192 

lENT -6 





C- 




















p- 


























SHEARING ^''1 

























L 






^WAtCH TIC 



J 

























’'fin 


































WATC 

ITI 

piL. 
















■A.A 

ill 

SPE 

rcH® 





m 


— 

AUD 

3-A 

OMETER 





























xgSO 

J 2 ioo 














*^100 






































~Z 

1 


















2 6 

4 l28 256 512 034 2048 4096 6192 
FREQUENCY 

audiogram of patient -7 

'16 32 64 128 256 512 1024 2048 4096 6192 

FREQUENCY 

AUDIOGRAM OF PATlENT-Q 

-?0 













NORMAL .^p 



















r~ 











1 
























HEARING ° 








































3^40 

..j as 















S»40 







WATCH 
















CJljWi 








t= 











Tl( 

'H N 










LZj 

— 


— 







SPE 

ECH^ 


— 

— 



— 


“ ;* 


ER 







. 




-i . ^ 

1 .... ..m i . .. < 



— 





1 






-■W ‘ " 

''too 

. .. I . 


^IQO 














t — I 

— — — 1 LA::' 


1 












16 32 0 

A 

4 128 256 512 1024 2048 4096 W92 

FREQUENCY 

udiocram of patient-9 

16 

2 64 128 256 512 1024 2048 4096 6192 
FREQUENCY 

AUDIOGRAM OF PATlENT-IO 


'20 














-20 














' 

— 

— 











NORMAL 






... 

, 









hearTnc'"'^ 

E 


E 

E 

E 

E 

z: 

~ 

— 

z: 

z: 

— 

ZI 

HEARING 

G 20 

3:: 







ZI 

H 

H 


ZI 

ZI 

n 

ZI 

ZI 

ii4o 

0 3 : 

S' GfiO 

Peo 

^100 

~ 

n 

n 



” 

“ 

— 

ZI 





S |40 

z: 

ZI 

z: 

H 

z: 

z: 

z; 

3 

E 

Z" 









fcr; 







iiR 

So 60 







ST 



j^auhonic 
























SPEtCM 

—— 





ce: 




SPttCH 







A 





"A 

_ 





y 




tz 



zxz 







EZ 

1 — 





. . 

r- 


— — 



















\ 


.. 

_. _ 

16 


4 

26 256 512 1034 2048 4096 6192 IB 32 64 128 256 512 1031 2 

0484096 8192 


FREQUENCy FREQUENCY 

Figure 106. 

The i-A audiometer is similar in principle to the 2-A 
audiometer, but has a much greater range in frequency and 
intensity. Both of these instruments are equipped with a jack, 




2i8 


SPEECH AND HEARING 


so that auxiliary equipment may be used with them. The 
otologist making the test frequently desires to talk to the 
patient either before or during the time that the pitch-range 
test is being made. For this purpose a talking set has been 
provided. By plugging the end of the cord of the talking set 
into the jack of the audiometer, it is possible to talk directly 
to the patient through the receiver which he is using during 
the test. By turning the attenuating dial, the volume of the 
speech thus delivered to the patient^s ear can be varied in 
intensity from high values to the threshold of audibility. Also, 
the phonograph audiometer may be used as an auxiliary to both 
the i-A and the n-A audiometers by plugging the cord from the 
turntable unit of the phonograph audiometer into the appro- 
priate jack. Then the speech waves coming from the phono- 
graph record will be delivered directly to the receiver of the 
patient's ear. The speech volume may be controlled by the 
attenuator dial. A mark on this dial can be determined so 
that readings on the scale from this mark will give directly the 
patient's hearing loss for speech. 

To show the comparative results obtained by different 
methods of testing, the degrees of hearing of ten persons were 
tested by four different methods, namely, the phonograph 
audiometer, the 3- A audiometer, the standard speech test, and 
the watch-tick test methods. The results obtained are shown 
in Table XXIV. 

Audiograms obtained by the i~A and the 2-A audiometers 
for the persons tested are given in Fig. 106. On these audio- 
gram charts the hearing losses for speech and for the 3-A 
audiometer tone are shown at the left and right, respectively. 
The black rectangle represents the watch tick results. A 
comparison of the results shown in columns III and IV of 
Table XXIV shows that the phonograph audiometer gives 
results which are in good agreement with the reduced results 
from the standard speech tests. The differences are well 
within the observational error. 

The watch tick results are just what one would expect from 
the amplitude frequency characteristic of this type of sound, 


REDUCTION OF RESULTS OF TUNING-FORK 219 

It is interesting to note that if patients 3 and 5 were given a 
watch-tick test only, they would be considered to have the same 
amount of hearing, but further tests showed that patient 3 
could hear speech at 20 feet, while patient 5 could hear it only 
2 feet away. It is evident from the audiograms why this is 


TABLE XXIV 


I 

Ear 

Num- 

ber 

. II 

Distances 
for Interpreting 
Numbers 

Ill 

Flearing 
Loss ‘ 
Cal. 
from II 

IV 

Hearing 

Loss 

Phono- 

graph 

Audi- 

ometer 

V 

Distances 
for Watch 
Tick Test 

VI 

Hearing 
Loss 
Cal. 
from V 

VII 

Hearing 

Loss 

3 -A 

Audi- 

ometer 

Voice A 
V= 58 

Voice B 
Vo- 65 

A 

Vo=io 

B 

Vo-r5 

I— R 

800 ft. (cal.) 

1800 ft (cal.) 

0 

0 

0 

40 in. 

70 in. 

0 I 

0 

I— L 

800 ft. (cal.) 

i8c 30 ft (cal.) 

0 

0 

0 

20 in. 

36 in. 

6 5 

0 

2 — R 

140 ft. (cal.) 

320 ft (cal.) 

15 

15 

15 

10 in. 

31 in. 

12 7 

10 

2— L 

I40 ft. (cal.) 

320 ft. (cal.) 

15 

15 

15 

15 in. 

64 in. 

8 2 

10 

3 -R 

20 ft. 

20+ ft. 

32 

? 

30 

3 in. 

S2 in- 

22 22 

30 

4 -L 

3 ft. 

6 ft. 

48 

49 

35 


I in. 

- • 39 

35 

5 -L 

1 .7 ft. 

3-7 ft- 

53 

54 

45 

2 in. 

4.5 in. 

26 23 

! 40 

5 -R 

1.8 ft. 

3-7 ft- 

52 

54 

50 

3 in. 

6.5 in. 

22 21 

, 45 

6— L 

2 ft. 

3-5 ft- 

52 

54 

50 


contact 

•• 55 


6-R 

1 .5 ft. 

4 ft. 

54 

S 3 

45 


.5 in. 

. . 45 

, 

’ 45 

7 -™R 

.9 ft. 

2 ft 

57 

5 ^ 

50 


contact 

•• 55 


8~-L 

1 . 2 ft. 

3.7 ft. 

56 

54 

60 




’ 50 

8--R 

I .2 ft. 

3-7 ft- 

60 

54 

60 


contact 

55 

! 50 

4 -R 

No test 




65 





9 ---L 

8 in. 

13 in. 

62 

64 

65 




, 

60 

9 -R 

5 in. 

8. 5 in. 

6c 

68 

70 




60 

7— L 

8 in. 

15 in. 

61 

63 

70 




i 6c 

10 — R 

3.5 in. 

9 in. 

! 69 

68 

70 




45 

10 — 

2 in. 

6 in. 

74 

71 

75 




60 

3 -L 

No test 


90 




80 


true. Patient 5 hears the high frequencies very much better 
than the low frequencies. An examination of the various 
audiogram charts indicates that the 3-A audiometer gives a 
good criterion of the general hearing level. However, results 
obtained by it are not definitely related to the hearing loss for 


220 


SPEECH AND HEARING 


speech, although there is a general agreement. A notable 
exception is the lack of agreement in the results obtained by- 
patient lo. The 3-A type audiometer gave 45 db for the right 
and 60 db for the left ear. The phonograph audiometer gave 
the hearing loss for speech as 70 db for the right and 75 db 
for the left ear. An examination of the audiogram for patient 
10 shows why this should be. The most important frequencies 
for speech interpretation — that is, from 500 to 2000 — are con- 
siderably below the general level, the low frequencies being the 


TABLE XXV 
Hearing Loss for Speech 


Ear Number 

Calculated 

from 

Audiograms 

Observed 

Average 

Ear Number ' 

Calculated 

from 

Audiograms 

Observed 

Average 

I — L 

2 

0 

1 

6 ~L 

55 

50 

I— R 

2 

0 

6 -R ! 

48 

50 

2— L 

20 

15 

7 -L 

48 

50 

2— R 

16 

15 

7 -R 

73 

70 

3 -L 

39 

31 

8-L 

61 

58 

3-R 

89 

90 

8— R 

60 

58 

4 -L 

40 

40 

9-L 

68 

65 

4 -R 

72 

65 

9 -R 

72 

70 

5-L 

5 ^ 

50 

lo — -L 

75 

75 

s-R 

1 61 

55 

10 — R 

66 

70 


highest. This is also true for patients 8 and 9. For patient 5 
the speech frequencies are on about the same level as the low 1 
frequencies, but very little loss is shown for the high frequencies. 

In this case it is probable that the 3-A audiometer tone was 
heard because of components between 2000 and 8000 cycles 
per second. 

As will be seen from the experiments discussed in Part Four 
the important frequencies for recognizing speech are between 
500 and 2000 cycles per second. The simple method of taking 
an average of the hearing losses at 512, 1024, and 2048, for 
determining the hearing loss for speech gives results in good 



REDUCTION OF RESULTS OF TUNING-FORK 221 


agreement with observations on these ten patients. In Table 
XXV is shown a comparison of results thus obtained. The 
observed values are averages of the figures given in columns 
III and IV of Table XXIV. The per cent hearing loss is 
obtained by multiplying the figures by .83. Until more 
accurate methods are devised, this simple procedure will be 
useful in calculating from the audiogram the per cent hearing 
loss for speech, which figure is of greatest interest to the patient. 

It is thus seen that by the use of the hearing loss scale in 
db, it is possible to express the results of the different methods 
of testing upon a common basis so that they may be directly 
compared. 










Part Four 

The Perception of Speech and Music 







Ca^PTER I 


The Loudness of Sounds 

When a sound of any character is impressed upon the ear 
the magnitude of the sensation produced is called the loudness 
of the sound. It is related to the intensity of the sound but 
the relationship is very complicated, depending upon the 
character of the sound. Two sounds which produce equal 
intensities at the ear are not generally recognized as being 
equally loud. Let two soujPid sources of different character 
be adjusted so thkt they sound equally loud. Then let the 
intensity level of each of the sounds be raised the same 
amount. In general they will^no longer sound equally loud. 

For these reasons, neithei; intensity level nor sensation 
level can be taken as a measure of loudness. Some scale must 
be used so that the loudness of one sound as indicated by the 
number assigned to it will be the same as that of any other 
sound having the same number on the loudness scale. 

It has been suggested that the unit of loudness be chosen to 
be the least perceptible increment in intensity. As seen from 
Part Three, Chapter III, the fractional perceptible increase 
varies through a wide range depending upon the sensation level 
and pitch of the tone. Although the relations would be 
simpler if this fraction were constant so that a logarithmic 
scale could be used, nevertheless, a scale could be built with 
the threshold intensity as zero loudness and with each per- 
ceptible increment above this intensity designated as one addi- 
tional unit of loudness. The numbers on such a scale would 
then be the number of distinguishable gradations in intensity 
for any sensation level a — ao or if L represents the loudness 
on such a scale. 



226 


SPEECH AND HEARING 



The difficulty with such a scale is that when dealing with tones 
of different pitchy equal loudness numbers do not correspond 
to tones which sound equally loud. For exani| 5 le, a tone of 
pitch — 4 octaves and at a sensation level of 40 db sounds 
equally loud to a tone having a pitch level of zero and which 
is at a sensation level of 70 db; the loudness numbers on the 
above scale corresponding to these two tones are 20 and 125, 
respectively. This difficulty can be avoided If some sound be 
chosen as a reference for comparison^ for example, a pure tone 
having a pitch level of zero. The loudness of any other sound 
would then be represented by the loudness of this reference 
standard when it is adjusted to sound equally loud to the sound 
being measured. However, after choosing such a standard, 
the scale based upon equal detectable increments loses its 
significance and Is apt to be misleading. For these reasons, it 
seems better to choose the sensation level of a pure tone having 
the zero reference pitch, that is, corresponding to a frequency 
of I kilocycle per second, as a measure of its loudness. The 
loudness L expressed in decibels of such a reference tone Is 
related to the intensity level a, the Intensity I expressed in 
microwatts, and the pressure variation near the drum of the ear 
p expressed In bars, by the formula 

L = a + 92 = 10 log / + 92 == 20 logp + 66. (2) 

The loudness of any other sound, whether a pure tone having a 
different pitch, a musical tone, or any other complex sound, is 
measured by the loudness of the reference tone which sounds 
equally loud as judged by an average normal ear. 

Loudness of Pure Tones 

Those who have tried to make measurements on the loud- 
ness of sounds having different pitches realize the difficulty in 
obtaining from different individuals judgments which are con- 
sistent. The average normal ear as used above Is hypothetical 


THE LOUDNESS OF SOUNDS 


227 


but nevertheless very important. It implies that before any 
observed value is sufficiently reliable so that it may be dupli- 
cated by another experimenter^ the results from a large number 
of observers must be obtained. This is true of most of the 
work described in this part of the book. 

Some pioneer work on the relative loudness of pure tones 
using organ pipes as sources of sound was done by Sabine.^ 
The sensation level of each tone was determined from the time 
necessary for the intensity of the sound to decrease to the 
threshold after the source was cut off. MacKenzie ^ of Bell 
Telephone Laboratories also made measurements of relative 



B-SYSTEM 

Fig. 307. — Schematic Diagram of Apparatus. 


loudness using an instrument called the Alternation Phonom- 
eter. Due to the rapidity of alternation of the tones being 
compared (twenty-live per second) there is some question as 
to the general application of his results. 

The most comprehensive work on the relative loudness of 
pure tones is that reported by Kingsbury of Bell Telephone 
Laboratories. He found that reliable average values were 
obtained by using as observers eleven men and eleven women. 
Figure 107 shows a schematic diagram of the apparatus which 

1 ** Collected Papers on Acoustics,’^ p. 130. 

^ Thy deal Review ^ October, 1922. 





228 


SPEECH AND HEARING 


he used. A 700-cycIe tone was used as a standard of com- 
parison. The attenuator in the A system was arranged so 
that the tone of this frequency generated by the receiver 
could be brought to any intensity level. Three independent 
settings of A for the threshold of intensity were then made. 
The comparison tone was then generated by system B, and 
similarly, three measurements on the threshold of the tone of 
this system were obtained. Next, the experimenter set the A 
attenuator at one of the selected comparison levels and allowed 
the observer to adjust the B attenuator until when listening 
alternately to the two tones they seemed equally loud. The 
attenuation settings and the deflections of the meters were 
then recorded for both systems. This process was repeated 
for the other fixed levels of the tone A until three independent 
determinations had been made for each level. The order of 
taking the fixed levels of the A tone was made as random as 
possible and two successive determinations were seldom made 
at the same level. When the comparison of the two frequen- 
cies was finished, the A and the B thresholds were once more 
secured. 

The work was then repeated, using a tone of different pitch. 
In this way the sensation levels of twelve tones which appeared 
to have the same loudness as the reference 700-cycle tone were 
obtained for eight selected levels of this tone. In Table XXVI 
the results of these measurements are given. Each value is the 
average for sixty-six observations, three independent measure- 
ments being taken for each of twenty-two persons. In the 
first and second columns the frequency in cycles per second 
and the pitch in centioctaves of the comparison tone are given. 
In the third column the values of sensation levels in db for 
tones which sound equally loud to a 700-cycle tone, which is 
9.3 db above the average threshold, are given. Similarly, 
each of the remaining columns gives the values for equal 

ou ness when the 700-cycle comparison tone is at the value 
indicated opposite 700. 

According to the definition of loudness given above, any 
particular value opposite 1000 gives the loudness of all the tones 



LOUDNESS OF PURE TONES 




TABLE XXVI 
Loudness of Pure Tones 


Frequency 

Pitch 

Sensation Level 

60 

—406 

8,4 

12.5 

U -7 

19.2 

24.8 

29.9 

37-3 

41.9 

80 

-364 

6.6 

II. 4 

14.9 

19.6 

25.1 

31-7 

36.1 

43-9 

150 

“273 

9-9 

15.2 

20.8 

25.6 

32-5 

39-7 

00 

52.9 

aoo 

“232 

12.7 

16.4 

23.2 

25.9 

36-5 

44.8 

54-4 

62.2 

340 

-156 

9.4 

18.0 

25.1 

33-8 

41-7 

50.7 

61,6 

73-5 

440 

-118 

9-3 

17.7 

25.7 

33-8 

42.2 

52.5 

62.9 

75-3 

700 

“ 51 

9-3 

19*3 

29-3 

39-3 

49-3 

59-3 

69-3 

79-3 

1000 

0 

II. 7 

20.7 

3 I-I 

40.7 

52.1 

60.9 

69.8 

78.5 

1500 

+ 58.5 

II .6 

21.4 

32.5 

42,5 

5^-5 

61.7 

71*3 

79.2 

1900 

+ 92.7 

II. 9 

22.4 

36-3 

45-7 

56.7 

65.0 

73-7 

80.2 

3200 

~j”i68 

9-9 

18.5 

31.6 

43-1 

52.8 

61 . 1 

70.0 

77.0 

4000 

-}“200 

8.9 

22,1 

31.6 

44.0 

53-2 

61 . 1 

68.9 


Loudness 


10.4 

20.7 

32.1 

42.4 

52.4 

61.5 

70,5 

78.8 


corresponding to the values in that particular column. For 
example, the loudness of all the tones giving the values in the 
sixth column is 40.7 db. Values of loudness given in the row- 
corresponding to 1000 cycles are necessarily subject to an 
observational error. To obtain more accurate values, the 
average for tones of frequencies 700, 1000, 1500, 1900, 3200, 
and 4000 was used. This procedure is only justified because 
the data indicate that for tones in this range equal loudness 
changes correspond to equal sensation level changes. That 
this is true is another reason for choosing the type of loudness 
scale indicated above. The loudness values obtained from the 
averages of these six tones are given in the bottom row of this 
table. Using these values, curves can now be drawn sho-wing 
the relation between loudness and sensation level for the various 
frequencies. In Fig. 108 such curves are shown. It is seen 
that the tones of low pitch increase in loudness much faster 
than those of high pitch. 

All of the values given above were obtained by direct 
comparison with the 700-cycle tone. The question arises: 
“Will two tones having the same loudness, as indicated by 



INTENSITY LEVEL 


SPEECH AND HEARING 


such tests, sound equally loud when directly compared?” 
Kino-sbury tested this point for the two tones of frequencies 
200 and 3200 cycles using eleven observers. The results were 


mmBOBassiR 

mnnosiiQBim 

mmmsaum 

7^ 




0 10 20 30 40 50 60 70 80 90 100 

SENSATIpN LEVEL 

Fig. 108. — Loudness of Pure Tones. 



■■■■■■■■■■■■■■■■■■■nsii 

■■■■■■■■■■■■■■■■■■■■■■■ 


smnl 


■■■■■■■■I 


Fig. 109. Contour Lines of Equal Loudness for Pure Tones 



LOUDNESS OF COMPLEX SOUNDS 


231 


found to agree within the observational error, so it seems 
reasonable to conclude that two pure tones having the same 
number on the loudness scale chosen will sound equally loud 
as judged by an average normal ear. 

In Fig. 109 contour lines of equal loudness for pure tones 
are shown. These lines were derived from the data shown in 
Table XXVI. They are useful in determining intensity levels 
from loudness balances or loudness from intensity level meas- 
urements. For example, a tone having a pitch of — 3 octaves 
will have a loudness of 50 db at an intensity level of — 30 db. 
A tone having the same loudness at three octaves higher in 
pitch has an intensity level of — 42. The dotted portions of 
the curves go beyond any experimental data and are only the 
author’s estimate of how they should go. 

Loudness of Complex Sounds 

The comparison of the loudness of complex sounds which are. 
different in character is also very difficult, but reliable averages 
can be obtained if a sufficient number of observers and trials are 
used. The 3-A audiometer mentioned in Part Two, Chapter II, 
and Part Three, Chapter VI, is useful for making loudness 
measurements of complex sounds. It was calibrated in terms 
of the standard 1000-cycle tone by making loudness balances 
with seven observers. The average results are shown in the 
curve of Fig. no. Then to determine the loudness of any 
sound, the 3-A audiometer is adjusted so that the tone from 
its receiver has the same loudness as the sound being measured; 
from the dial reading and the relation expressed in the curve 
of Fig. 1 10, the loudness can then be determined. 

In a similar way the loudness values at different sensation 
levels of four different complex sounds designated as A, B, C 
and D were determined and indicated in Fig. iii. The sound 
A consisted of repetitions of the sentences “Joe took father’s 
shoe bench out” and “She was waiting at my lawn.” These 
sentences were selected because they contain all the funda- 
mental speech sounds which are important from a loudness 



SPEECH AND HEARING 


232 

standpoint. They are used in the laboratory for testing the 
efficiency of telephone apparatus. To insure that the loudness 
of this speech would remain at any desired level, it was recorded 
on a phonograph and reproduced by means of an electro- 



Fig. no. — L oudness of 3-A Audiometer Tone. 




LOUDNESS OF COMPLEX SOUNDS 233 

ms-gnctic reproducer and telephone receiver. By means of an 
attenuator in the electrical circuity the level of the speech 
coming from the telephone receiver could be adjusted to any 
value. 

Acoustic spectra of the sounds B, C, and D are also given 
in the figure. As a general rule those sounds which have a 
large number of components increase in loudness at a faster 
rate with an increase in sensation level than those with a 



Fig. 112. — Per Cent of Unfiltered Energy Equivalent in Loudness to Filtered 

Energy. 


smaller number of components. Also those sounds having 
most of the energy in the low-frequency regions increase in 
loudness faster than those with the energy in the high-frequency 
regions. 

Measurements were made to find the effect upon the loud- 
ness of the sound B when certain of its components were 
eliminated. This sound was created by means of an electrical 
buzzer which was connected to a telephone receiver. The 
circuit was arranged so that by means of electrical filters (see 
Figs. 134 and 135), any of the components of this sound could 
be eliminated. Tests were made to determine the effect on 
the loudness of eliminating certain frequency regions at three 
sensation levels, namely, 2a, 43, and db, respectively. 
The results of these tests are shown in Figs. 112, 113, and 114. 
The horizontal axis gives the cut-off frequency of the filter 



SPEECH AND HEARING 


[n the curve labeled “High Pass Filter” results are 
for the case when all frequencies below the cut-off 
cy point are eliminated; in the curve labeled “Low 



—Per Cent of Unfiltered Energy Equivalent in Loudness to Filtered 
Energy. 


liter” for the case when all the components having 
icies above the cut-off frequency are eliminated. The 
I axis gives the per cent of its initial value to which the 



—Per Cent of Unfiltered Energy Equivalent in Loudness to Filtered 
Energy. 

ty of the unfiltered tone can be reduced before it has 
me loudness as the filtered tone. For example, the 
'1500, 50) on the low pass filter curve in Fig. 114 indi- 



LOUDNESS OF COMPLEX SOUNDS 


235 

cates that when all the components above 1500 cycles are 
eliminatedj, the loudness of the filtered sound is so reduced 
that the unfiltered sound must be reduced 50 per cent in inten- 
sity or 3 db to sound equally loud. 

In telephone work a large amount of loudness balancing is 
done with a source of sound consisting of speech which has been 
transmitted through various kinds of systems. Experiments 



Fig. 1 1 5. — Per Cent of Unfiltered Energy Eq,uivalent in Loudness to Filtered 

Energy. 


similar to those described for the test tone were therefore made 
with speech, the speech being transmitted through the high 
quality telephone system shown in Fig. 127. The results are 
shown in Figs. 115 and 116. 

The frequency at the intersection point of the curves in 
each case has an important meaning. The sound composed of 
only those components above this frequency will appear to 
have the same loudness as a sound with only those components 
below this frequency. At low intensities the stimulating forces 
will be confined to the lower part of the basilar membrane in 
one case and to an upper part for the other case. Since equal 
loudness sensations are produced, it is presumed that the 
number of nervous impulses sent to the brain are the same for 
the two cases. This seems to indicate that at such intensities 
the number of impulses is proportional to the intensity and 
that the law of superposition holds. At higher intensities. 



SPEECH AND HEARING 


236 

however, the stimulation for the filtered tone having the low 
frequencies is not confined to the lower half of the basilar 
membrane but due to the subjective tones extends into the 
upper half also. Similarly, for the other filtered tone due to 
the difference tones, the lower half of the basilar membrane 
is partially stimulated as well as the upper half. When both 
tones produce stimulation simultaneously, then, on account 


O 

« 

ui 

Z 

td 

f- 

Z 

Id 

U 

cc. 

Id 

CL 


Fig. 1 16.— Per Cent of Unfiltered Energy Equivalent in Loudness to P'iltered 

Energy. 

of the non-linearity of the various parts of the mechanism 
there is an interaction between the tones so that the law of 
superposition no longer holds. 

The tone with the lower range of frequencies increases in 
loudness faster than the one with the upper range as the inten- 
sities of both are increased. Therefore, the dividing fre- 
quency for equal loudness shifts to the lower frequencies as 
seen in Figs, iia, 113, and 114. For example, for the test tone 
the dividing frequency shifts from 1300 for a sensation level of 
22; to 1100 for a sensation level of 43; and to 800 for a sensa- 
tion level of 70* For the two higher levels the intersection 
point is about 25 per cent instead of 50 per cent. In other 
words, the intensity of the unfiltered tone must be reduced to 
one-fourth its value to have the same loudness as either part 
of the filtered tone. The addition of one filtered tone to the 
other is equivalent from a loudness standpoint to raising the 





LOUDNESS OF COMPLEX SOUNDS 


237 


intensity of either one separately to four times its original 
intensity. 

It has been stated above that the elements of the middle 
ear taking part in the transmission of the sound have a non- 
linear characteristic which accounts largely for the subjective 
harmonics. The data indicate that a linear relation exists 
between the intensity and the number of nervous impulses for 
very low intensities, but for higher intensities this relation does 
not hold. This non-linearity together with that produced in 
the middle ear, accounts for the effects just described. When 
the two filtered tones act together, no more energy can be 
expended in a given time than if each acted separately on dif- 
ferent ears, but in the former case the stimulated energy is 
distributed differently along the basilar membrane in such a 
way that more energy is applied at those positions where a 
large number of nerve fibres are stimulated. 

To obtain further light on this point loudness balances were 
made with two ears vs. one ear listening to the sound C. To 
do this the sound was adjusted in each receiver so that when 
the sound was thrown on alternately in the right and left ear, 
it produced the same loudness. The sounds in the two receivers 
were then produced simultaneously. It was found that the 
intensity of the sound when listened to binaurally must be 
reduced a db to sound equally as loud as the same sound lis- 
tened to monaurally. The number of nervous impulses reach- 
ing the brain when listening binaurally must be twice that 
when listening monaurally. Consequently, this gives a means 
of determining the number of db the sound must be raised in 
sensation level in order to double the number of nervous 
impulses. Tests made with seven observers using the buzzer 
tone from the 3~A audiometer gave average values of a, for 
various sensation levels as shown below: 

Sensation Level a — Difference in Decibels 

29 3-1 

46 5-3 

64 10. o 

81 9*0 


238 


SPEECH AND HEARING 


It is seen that difference in loudness under these conditions 
is approximately the same as that corresponding to intersection 
points in the filter experiments described above. In Fig. 117 
the results of both types of tests are given and a smooth curve 































t. 

j 

) 


— 







J 












i 











2 

t 











7 



0 FROM BINAURAL VS. 

MONAURAL LISTENING 

a FROM INTERSECTION POINTS 

OF FILTER CURVES 














_ 

_ 


_ 

_ 


O 10 26 30 40 50 eo 70 80 90 100 '110 120! 

SENSATION LEVEL 


Figure 117. 


drawn to best fit the data. It must be remembered that such 
data are very difficult to obtain on account of the wide varia- 
tion in the judgment of different observers. This accounts 
for the large departures from the curve. The function propor- 
tional to the number of nervous impulses stimulated at the 
different sensation levels can be deduced from the relation 
expressed in this curve. Assuming the number of such nervous 
discharges to be 500^000 per second when the sensation level 
of sound was 100, then a reduction of 8.8 db reduces this 
number to 250,000; a further reduction of 9 db reduces it to 
125,000; and so on. In this way the curves given in Fig. 118 
were constructed.^ 

1 Since this was written further experimental work has been done using pure tones 
as the source of sound. Although the results obtained by the author were similar to 
those given in the table, other observers obtained widely different results. Con- 
sequently, conclusions based upon this type of reasoning are seriously open to question. 
However, such curves as those shown in Fig. ii8 are very interesting and important, 
but until further experimental data are secured they must be considered only as 
hypothetical. 




FORMULATION OF AN EMPIRICAL EQUATION 239 


Formulation of an Empirical Equation for Computing Loudness 
Losses 

If the percentages of the total energy passed by the filters 
are plotted as a function of the cut-off frequency, curves which 
intersect at 50 per cent are obtained. Also, the sum of the 
two ordinates corresponding to any abscissa will be 100 per 
cent. The curves will be the same irrespective of the absolute 
value of the total energy. In the case of the experimental 
curves on loudness, an exponent of the observed percentages 
can be chosen so that at the intersection point the ordinate 
raised to this power will be one-half. If the other observed 
percentages are raised to the same power and a curve is made 
of the resultant figures, it is found that the sum of the ordinates 
from the two curves corresponding to any chosen abscissa is 
approximately unity. The dotted curves shown in the figures 



Fig, 1 1 8, — Relation between Nervous Impulses Per Second and Sensation 

Level of Sound. 


mentioned above were obtained in this way. If we desire to 
adopt the idea that each frequency component in the external 
sound wave contributes an integral amount to the resultant 



240 


SPEECH AND HEARING 


loudmess in such a way that the component parts can be 
summed to give the resultant loudness^ the fractional loss can 
be empirically represented by an equation of the type 




( 3 ) 


The summation is taken over all the components. The 
quantity y is the fractional decrease in the undistorted sound 
necessary to make it have the same loudness as the distorted 
sound. E and £0 are the energies in each component before 
and after the distortion. The weight factor W and the expo- 
nential constant b can be determined from the experimental 
data. 

For no distortion, that is, when Eh = Eh^^ 

/ = I == Sl/F/. (4) 


For the filters used in these experiments Eh may be con- 
sidered zero in the attenuated range and equal to £/,o in the 
unattenuated range of frequencies, so that for the low pass 
filter 

= SxW (5) 

where is the unattenuated component. 

Similarly for the high pass filter, 

y = (6) 

where is the last unattenuated component. 

For any two complementary filters, 

y^ + y - + ^h^Wi = I. (7) ^ 

This equation must hold regardless of the weight factor 
function Wh- This means that the sum of the ordinates for 
the two curves corresponding to any abscissa must be unity. 
If this is not true, the empirical equation assumed is not ade- 
quate. Also at the intersection point yh = so that 

/ = ! or ^ = log I/log jy 

which is sufficient to determine the value b. 


( 8 ) 



FORMULATION OF AN EMPIRICAL EQUATION 241 


The value of y is related to the value of a given in Fig. 118 
by the relation jy = 10“ so that 


b = 


3-01 

a 


(9) 


An empirical formula similar to equation (3) modified to 
fit a continuous spectrum such as speech is 

/ = (\wEIEofdf. (10) 

Jq 

For the filter experiments this reduces to 

J '^mi /'CO 

W'^df and 7/ = I W'^df (ii) 

0 

where m\ and m2 are the cut-off frequencies for low pass and 
high pass filters, respectively. 

It is seen from these equations that the weight factor can 
be obtained by taking the slope of either of the high or low 
pass filter curves. Thus for computing the loss of loudness of 
speech coming from a telephone receiver due to attenuating 
certain frequency regions, the formula 

J ^oo 

I {WEIEdfdf (12) 

0 

can be used when the sensation level is in the important inten- 
sity range used in practice, that is, from 65 to 100 db. The 
function is the slope of either of the dotted curves shown 
in Fig. 1 16 and y is the fractional energy reduction in the 
undistorted speech required to make it equal in loudness to the 
distorted speech. 

In this range of intensities a change in loudness is approxi- 
mately the same as a change in sensation level. In other 
words, a curve showing loudness as a function of sensation 
level h(as a slope of approximately 45° in this range (see Fig. 



242 


SPEECH AND HEARING 


in) for most sounds including various kinds of distorted 
speech. Consequently if a is a function which gives the loss 
in db at each frequency due to the introduction of some piece 
of apparatus in the circuit, then the loudness loss a is given by 

io” 3 o = I G(f) • (13) 

Jo 

where G(f) is a weighting factor depending upon the type of 
circuit into which the apparatus is introduced. If the system 



Fig. 119. — Curves tor Computing Loudness Losses, 


reproduces speech perfectly, the function G{f) is the slope of 
the dotted curve in Fig. 116. A curve similar to this was also 
obtained for a transmission system having resonant elements 
such that frequencies near 1200 cycles were reproduced much 
more efficiently than those either higher or lower than this 
va ue. The G(^ functions for these two cases are shown in 

Fig. 1 19. The two corresponding curves for G{f) df are ■ 

also given. These aid in the calculation of the above integral, 


FORMULATION OF AN EMPIRICAL EQUATION 243 


which must be done by graphical methods. For this purpose 
the variables/ and a are changed to *v and jy by the relations 


Then 


X = ^ G{f)df and y = 10 



(14) 


The values of x are obtained from Fig. 119 and the values of y 
from the loss curves for the particular problem at hand. If 
these values are plotted on cross-section paper^ then the area 
between the resulting curve, the X-axis, and the ordinates at 
o and I gives the value of the required integral. 


Comparison of Observed and Calculated Values 

In order to test this formula the loss in loudness was com- 
puted when different types of resonant networks were intro- 


500 1000 1500 2000 2500 3000 3500 4000 



Fig, 120. — Los.s Curves of Resonaht Systems and Effective Loss in Loudness 
OF Speech at Loudness Levels between 70 and ioo Units. 


duced into an otherwise distortionless system. The losses at 
each frequency for six different resonant systems are shown by 
the six curves in Fig. i io. The table gives the calculated and 



244 


SPEECH AND HEARING 


observed loudness losses for these systems. The observed 
values are averages taken by several observers. 

When the experimental measurement of the effective loss 
produced by such resonant networks is made at lower levels, 
the losses are smaller. Using the weighting factor functions 
derived from the experimental data for these lower levels taken 
with the filters, the curve on Fig. 121 was calculated. The 
averages obtained by several observers are shown by the circles. 
These data were obtained with the resonant system No. i, hav- 
ing the response characteristic shown in Fig. 120. It is seen 
that the loss in sensation units at the low intensity levels is only 
about one-half that at the higher levels. 

To illustrate how these relations can be used in telephone 
engineering, it is shown in Appendix E that if a condenser of 



Fig. 1 21. — Effective Losses for Resonant System No. i. 


2-microfarad capacity is connected across a long transmission 
line near its middle, the* loudness loss caused to the reproduced 
speech being transmitted is 9.3 db when the high quality cir- 
cuit is used and 12.3 db when the resonant circuit is used. 

In applying these formulas for loudness losses it must be 
remembered that they are empirical in origin and hence are 
limited to only that class of data from which they were derived 
and to comparatively small loudness changes. Although they 
are adequate for most practical purposes, they are not satis- 
factory from the standpoint of understanding the fundamental 
elements which determine loudness and how they operate when 
a loudness judgment is made. Attempts to follow these 
processes and to develop methods for calculating loudness 
which have a general application have not yet been successful. 




CHAPTER II 


The Recognition of the Pitch of Musical Tones 

As used in the musical sense, the pitch of a tone is the posi- 
tion on the musical scale to which the tone belongs, the high 
pitch being high and the low pitch being low on the musical 
staff. When referring to a single pure tone, the pitch, in this 
< book, has been given a numerical value, namely, that given 
by equation (6) of Part Three, Chapter III. When the tone has 
a large number of components, as is usually the case with tones 
from musical instruments, it still retains the quality of pitch 
and the position on the musical scale can be determined. The 
numerical value of the pitch of a complex musical tone is the 
same as that of a pure tone which is judged to have the same 
pitch. The pitch of any musical tone can then be determined 
experimentally by comparing it to a pure tone which can be 
adjusted so that it seems to have the same pitch. If the device 
producing the pure tone is calibrated so that its frequency of 
vibration is known, its pitch is determined from equation (6) 
mentioned above. 

Using this criterion, then, it is important to inquire what 
p- - types of musical tones will have the same pitch. It is well 
known that for pure tones the frequency of vibration deter- 
mines completely the pitch of the tone. However, when there 
are several components of different frequency in the tone, it is 
not so obvious which one will determine the pitch. 

An experimental investigation was made using the musical 
tones whose spectra are shown in Figs. 51, 52, 53 and 54. 
The pitches of these musical tones were determined by com- 
parison with a pure tone. This comparison tone was produced 
by a telephone receiver which was connected to a vacuum tube 



246 


SPEECH AND HEARING 


oscillator. Its pitch was adjusted to any desir^ value by 
making the proper settings on the oscillator. T le nuisica 
tones were transmitted through the high-quality telephone 
system, into which were introduced electrical filters. By means 
of this circuit any portion of the spectrum could be eliminated. 

The judgments of pitch and quality of the musical tones 
were made by three persons familiar with music. The resuhs 
of the tests with the musical sounds mentioned aie given in 
Table XXVII. In every case they agreed unanimously in tlie 
statements recorded in the fifth and sixth columns. Ehc 
musical tones were produced at sensation levels between yo 
and 80 db. In the third column of this table the letter F 
refers to the fundamental and the numbers refer to the over- 
tones, thus (F & 1-6) means that the fundamental and the first 
six overtones were eliminated. It is seen that the vowel ah 
sung at a pitch d is affected only slightly in pitch or qiuility 
when the fundamental and the first two overtones are elimi- 
nated. Even with the fundamental and the first six overtones 
eliminated, the pitch still very definitely corresponds to the 
pitch of a pure tone with the frequency of the fundamental, 
namely, 145 cycles per second. The harmonic analysis of this 
filtered tone shows no frequencies below 1000 cycles per 
second. Eliminating all of the overtones above the sixth 
changes the quality by about the same amount as eliminating 
the fundamental and first and second overtones. The data 
also indicate that if the fundamental and all of the upper and 
lower harmonics except the third, fourth and fifth, are elimi- 
nated, the remaining compound tone has the same pitch as the 
fundamental, although the quality of the sound is very dif- 
ferent from that of the sound ah. 

As indicated in the table, similar results were obtained for 
the vowel a, sung at the pitch a. Other vowels were tried 
with similar results. In general, neither the quality nor the 
pitch of notes from a rich baritone or contralto voice is appre- 
ciably affected by eliminating the fundamental and the first 
two to three overtones. If, however, higher overtones arc 
eliminated, the musical quality (in particular the richnc.ss) is 



FORMULATION OF AN EMPIRICAL EQUATION 247 


table XXVII 

Effect of the Elimination of Various Components on the Pitch and Quality 
OF Variou s Musical Sounds 


Source 

Pitch 

Eliminated 

Components 

Eliminated 

Frequencies 

Pitch 

Change 

Quality 

Voice- ah 

d(i 45 ) 

F 

F & 1-2 

F& 1-4 

F& 1-7 

F & 1-9 

6 — CO 

00 

F & 1-2 & 6-00 

0-250 

0-500 

0-750 

0-1250 

0-1500 

1 000- CO 

500- 00 
0-500 & 

1 000-00 

No change 

No change 
No change 
No change 
Uncertain 
No change 
No change 

No change 

Inappreciable 

change 

Small change 

Large change 

Very large change 
Noise 

Small change 

Large change 

Very large change 

Voice-ii 

a(ai8) 

F 

F & 1-2 

F & 1-4 

F & 1-5 

6-co 

CO 

F & l'-2 & 9-C0 

0-250 

0-750 

0-1250 

0-1500 

I 500- 00 

750- CO 

0-750 & 

2000— 00 ' 

No change 
No change 
No change 
No change 
No change 
No change 

No change 

Slight change 

Sounds like ah 

Small change 
Between ah and o 
Between ah and o 
Sounds like o 

Very weak ah 

Piano 

C(I29) 

F 

F& 1-2 

F' & i~4 

For more harmo 

5—00 

0-250 
0-500 
. ^ 75.0 

rues eliminate 
750-00 

No change 
No change 
No change 
id the tone los 
No change 

Small change 
Metallic 

Clanging 

t all musical character 
No brilliance 

Piano 

c"(SO) 

F' 

F& I 

All harmonics 

0-750 

0-1250 

750-00 

No change 
No change 

No change 

Small change 
Metallic 
( Pure tone 
■ Musical brilliance 
lacking 

Violin 

g'c.psr 

F 

F & I 

F tz 1-2 

2—00 

0-500 

o-iooo 

0-1500 

1000- OQ 

No change 
No change 
Uncertain 
No change 

Large change 

Very large change 
Non-musical 

Violin quality gone 

Ciarinet 


F 

F & 1-2 

F & 1-4 

7—00 

2— CO 

0-500 

O-IOOO 
0-1500 
2000-00 
750- CO 

No change 
No change 
No change 
No change 
No change 

Large change 

Very large change 
Non-musical 

Large change 

Pure tone (no clari- 
net quality) 

Organ pipe 


F 

F 1-2 

F & 1-4 

15-C0 
; "c 

0-250 
0-500 
0-750 
2000- 00 

No change 
No change 
Uncertain 
No change 
No 

Small change 

Large change 

Noise 

Very small change 
chrrge 


I- ■ 

E 6£ I -a 


-y-co 

9—00 


O-IOOO 

aooo-co 


750-00 


No FSTL 
Cncercain 
No change 
No change 


.1 .a;-ge cin-nge 
Non-musical 
Small change 
Sounds dull 


248 


SPEECH AND HEARING 


noticeably affected and this is true even though the omitted 
overtones are all above the fifteenth. The high harmonics do 
not seem to be so essential for good quality in a soprano voice. 
An experimental test with only a few voices showed the rather 
unexpected result that the elimination of all the harmonic 
frequencies above 2000 cycles affected the musical quality of a 
bassj a baritone, or a contralto voice to a greater extent than 
the quality of a high soprano voice. 

The table shows that the quality of the principal musical 
instruments is much more seriously affected by the elimination 
of the lower parts of their characteristic sound spectra than the 
quality of the sung vowels by a similar elimination. In any 
case such eliminations do not change the pitch, for this remains 
constant as long as the filtered sound can be recognized as a 
musical tone. 


These results were confirmed in a very striking manner by 
using ten separate vacuum tube generators for producing the 
component frequencies. These generators were adjusted to 
give the frequencies 100 cycles to 1000 cycles at intervals of 
100 cycles. They were all connected to a special telephone 
receiver and the currents regulated so that the pre.ssure ampli- 
tude of the components of the sound emitted by the receiver 
were equal. By suitable switching arrangement any one of 
the components could be eliminated. When they were all 
impressed upon the receiver a full tone resulted which had a 
definite pitch corresponding to 100 cycles per second. 'I’he 
elimination of the loo-cycle component produced no noticeable 
effect. The elimination of any other single component had no 
effect upn the pitch and almost none upon the tone quality 
although by careful listening, its introduction and withdrawal 
could be detected m most cases. Even with the first seven 
components eliminated, leaving only 800, 900, and 1000, the 
pitch corresponded to a frequency of 100. When only two 
components were left, they were heard as separate tones, the 
undamental subjective tone at 100 being still plainly audible 
ut much weaker than either component. Any three con.secu- 
tive components were sufficient to give the tone a pitch corre- 



FORMULATION OF AN EMPIRICAL EQUATION 249 

spoil ding to 100, as, for example, aoo, 300, 400 or 600, 700, 800, 
etc. When four consecutive components were sounded, the 
fundamental subjective tone was very prominent. When all 
of the components were sounded, this fundamental seemed to 
be louder than the other components and dominated the 
tone. 

The tests just described were made when the sensation level 
of the 700-cycle tone was at 90 db. When only three com- 
ponents, 700, 800 and 900, were used and the loudness of the 
combination was greatly decreased, it was found that the 100 
cycle subjective tone disappeared when the sensation level of 
the combination was approximately 45 db. At this level the 
three tones were heard as separate tones, the 900-cycle one 
being the last to disappear as the loudness approached zero. 
When five or more consecutive components were used, the 
pitch seemed to remain the same for low values of the loudness 
even down to zero, although for these very low values, it was 
very difficult to judge pitch. 

If the components 2,00, 400, 600, 800 and 1000 were used, 
the pitch corresponded to 200 cycles, i.e., to the octave of the 
compound tone discussed above. This tone still had the same 
pitch when the 200- and the 400-cycle components were elimi- 
nated. Any two consecutive pairs gave the subjective tone 
200, but only very weakly. Combination 300, 600, and 900 
gave a harmonious sound, the listener having the tendency to 
hear the combination as separate musical tones. 

From the results which have been described one might con- 
clude that the pitch of a musical tone was determined by the 
common difference in the frequencies of the harmonics, rather 
than by the frequency of the lowest component. This conclu- 
sion suggested trying a combination of frequencies which are 
separated by a common difference, but which are not neces- 
sarily multiples of this common difference. The combination 
100, 300, 500, 700 and 900 was tried and it was found to have 
no definite pitch, but sounded like a noise. However, one 
could distinctly hear the subjective tone at 200 cycles. Sim- 
ilarly, the combinations 100, 400, 700, 1000 and 100, 500, 900 



SPEECH AND HEARING 

and 200, 500, 800 were tried and found to have no definite 
pitch and to be entirely lacking m musical quality. 

Further evidence of the above phenomenon was made pos- 
sible by means of a carrier telephone system which was avail- 
able in the laboratory. The technic of carrier telephony makes 
it possible to displace all the frequencies constituting a com- 
pound tone by the same absolute amount upward or downward. 
Thus, if the compound tone of ten components which has been 
described were transmitted through such a system when the 
carrier at the transmitting end differed from the earner at the 
receiving end by 30 cycles, the frequencies received would be 
130, 230, etc., up to 1030. It is found that such a shift destroys 
the "musical quality which the original tone possessed. If the 
fundamental is very predominant this shift raises the pitch, 
but the inharmonic tones produced a harshness and the tone 
loses its musical character. 


Structure of '‘Harmonic” Tones from Such PFind Instrnmenls 
as the Bugle 

In this connection it is interesting to examine the structure 
of those tones produced on wind instruments by changes in the 
blowing intensity rather than by changes in the length of the 
vibrating air column. When the air pressure blowing an 
organ pipe or horn is continuously increased, the pitch of the 
emitted tone corresponds first to the fundamental and then 
suddenly jumps to that corresponding to the first overtone 
and then to that corresponding to the second overtone, etc. 
As is well known, it is this effect that makes it possible to 
produce the different notes on a bugle. One might expect to 
find all the harmonics of the fundamental in each of these notes. 
However, this would be contrary to the observations we have 
just described. In fact, we find from a few experiments upem 
organ pipes that the overtones are all present in appreciable 
amount when and only when the pitch of the tone is that 
corresponding to the fundamental, but when the pitch corre- 
sponds to the first overtone, only those components which are 



STRUCTURE OF “HARMONIC” TONES 


- 5 * 


^ multiples of the first overtone are perceptible. This is clearly 
shown in the sound spectra given in Fig. 122, obtained bv 



FREQUENCY - CYCLES PER SEC. 



FREQUENCY - CYCLES PER SEC- 

Fig. 122,. — Spectra for Organ Pipe. 


^means of the harmonic analyzer. They represent the sound 
emitted by an organ pipe when it was blown with the various 
pressures indicated on the charts. 








SPEECH AND HEARING 

As shown in Part Three, Chapter I, during the transmission 
of a pure tone to the inner ear, a number of harmonics are 
introduced when the impressed tone is loud. Consequently, 
when such a tone is sensed on the basilar membrane, essentially 
the same nerve fibres are excited as when the impressed tone is 
complex. In the large majority of cases, then, there are a 
number of excited nerve regions on the basilar membrane 
located at positions corresponding to the harmonics when a 
musical tone is sensed by the ear. In this respect the loud pure 
tones produce similar effects to the loud complex tones. The 
relative intensity of stimulation at these various places varies 
with the harmonic content of the tone, but the positions remain 
fixed. It is very probable that the relative positions of these 
regions of stimulation, due to the harmonics — either objective 
or subjective — are the real determiners of the pitch. When 
the complex tones are at levels above 40 db most of the missing 
tones in the harmonic series are supplied as subjective tones. 
To illustrate this, the four charts in Fig. 123 have been drawn. 
The first chart represents the spectrum for the synthetic tone 
used in the experiments described above. If these components 
were transmitted through a system which was linear and which 
discriminated against the frequencies in the same manner as 
the ear mechanism, the spectrum which would arrive at the 
oval window would be that shown in chart (^). An estimate 
of the inner ear spectrum which is produced when the non- 
linearity is present is shown in chart (r). This spectrum was 
estimated by the methods described in Part Three, Chapter IV. 
In chart id) is shown an estimated inner ear spectrum for the 
case when the first four components are eliminated from the 
synthetic loo-cycle tone. From these figures it is evident 
why this latter tone gives the same pitch and practically the 
same quality as when all the components are present. 

At the very low intensities, however, a spectrum similar to 
that shown in chart {b) must be impressed upon the inner ear. 
At these low intensities the pitch remains the same even when 
the five lower components are eliminated. This seems to 
indicate that even though no stimulation was produced at the 



STRUCTURE OF ‘‘HARMONIC’’ TONES 


^53 


position corresponding to loo on the basilar membrane, the 
pitch loo is still recognized, due no doubt to the spacing of 
the other components at positions corresponding to those 
which would be produced by a fundamental of loo. It is also 
evident from these charts that the quality of the musical tone 


i.o, 

0 

3 I 

H 

-0.50 

CL 

1 Q25 


I.O 

0.75 

0.50 

0.25 




















































Ca) PRESSURE SPECTRUM 
OF IMPRESSED TONE 




























n 
































































200 400 600 600 1000 1200 1400 1600 1800 2000f 

FREQUENCY 






~T 



n 

“1 


1 ij 1 1 1 1 1 1 











(b) SPECTRUM AT 
AUDITORY NERVE TERMINALS 
(linearity assumed) 
































n 























“1 




n 























I] 









_ 

_ 



_ 




_ 



200 400 600 800 1000 1200 1400 1600 1800 2000 

FREQUENCY 


<0.25| 












rn i"[ 1 I'l 1 











(C) SPECTRUM AT 





r 






AUDITORY NERVE TERMINALS 
(ACTUAL) 






























































I 






























600 800 1000 1200 1400 
.FREQUENCY 


1600 1800 2000 


1.0 

0.75 


20,25] 





r 

r" 






(d) SPECTRUM AT 
AUDITORY NERVE TERMINALS 
WHEN FIRST FOUR COM- 
PONENTS ARE ELIMINATED 
FROM (a) 















T 










— 
























































J 











_ 



_ 







0 200 400 600 800 1000 1200 1400 1600 1800 2000 

F REQUENCY 


Figure 123. 


must change as the Intensity Increases. This is a well-known 
fact and is important to remember when trying to reproduce 
music which gives the same effect as the original music. 

Before leaving this subject it should be pointed out that 
the time pattern theory of hearing outlined in this book will 



254 


SPEECH AND HEARING 


also account for these experimental results on pitch and at the 
low intensities may be the main contributing factor. Accord- 
ing to the nerre mechanism set forth in this theory the nervous 
discharge due to an impressed sinusoidal stimulus will always 
take place at the same phase of vibration, but not at every 
vibration. Consequently when the four tones having fre- 
quencies of 400, 500, 600, and 700 cycles per second act upon 
the ear, the impulses in the auditory nerve will be timed some- 
what as follows. There will be certain fibres excited by the 
400-cycle tone which will be firing every 4th vibration, certain 
ones excited by the 500-cycle tone firing every 5th vibration, 
certain ones excited by the 600-cycle tone which will be firing 
every 6th vibration and certain ones from the 700-cyclc tone 
which will be firing every 7th vibration. These discharges will 
all unite to form impulses in the auditory nerve having a time 
interval of .01 second. Similarly a number of combinations 
will unite to give an impulse at J, -|, i, J-, and | of thi.s 
interval. There will be discharges at other time intervals, 
but the number of fibres causing them will be considerably le.ss 
than at the particular ones given above. It may be that the 
recognition of these time intervals by the brain aids in the 
recognition of pitch. 



CHAPTER III 


Methods of Measurhstg the Recognition of Speech Sounds 

In order to make a quantitative study of the effect of 
various kinds of distortion upon the average person’s ability 
to recognize the sounds of speech^ it is necessary to have an 
accurate method of measuring his ability. Many such methods 
have been proposed and used; they all are based upon the 
general method of pronouncing speech sounds into one end of 
a transmission system and having observers write the sounds 
which they hear at the receiving end. The comparison of the 
called sounds with those observed shows the number and 
kind of errors which are made. The differences arise in the 
technic of making such tests and in the type of speech material 
used. The system tested may be the simple system used in 
most conversations; namely, the air between the mouth and 
the ear in a room, or it may be a telepone system, or it may be 
reproduced speech from a phonograph. In any case this 
method gives a quantitative measure of the understandability 
of the speech which arrives at the ear. 

Different types of speech material may be used as the test- 
ing material. It must be representative of speech and suit- 
able for making tests. These two aims at times come into 
conflict and the best compromise must be made. If the fun- 
damental speech sounds are taken as a unit, the per cent of 
correct letter sounds received is called ‘Tetter articulation,” 
from which the use of the terms “vowel articulation,” “con- 
sonant articulation” and “sound articulation” — the articula- 
tion for a particular fundamental sound — becomes obvious. 
If the syllable is used as a unit, the per cent correctly received 



2^6 


SPEECH AND HEARING 


is called the syllable articulation. If a sentence is used as a 
unit and is considered correctly recognized if the main thought 
is grasped by the observer, the per cent of such correctly 
received sentences is called “intelligibility.” 

In general, in this book where “intelligibility” is used, it 
refers specifically to the results obtained with the test sen- 
tences described below. 

Several different types of articulation testing lists have 
been constructed and used. The one which was used for 
making the tests mentioned in this book is known at present 
in the Bell Telephone Laboratories as “The Standard ' Articu- 
lation Testing List.” Most of the articulation data discussed 
in this book were obtained with these lists. Only the very 
simple syllable forms are used, namely, those composed of tltc 
form CV (consonant-Vowel), VC (vowel-consonant), and CVC 
(consonant-vowel-consonant). All the possible combinations” 
of the Lnglish speech sounds given in Table I were constructed, 
using these syllable forms. These syllables were arranged into 
lists of 50 each, such that in each list there were 5 syllaliles of 
the form CV, 5 of the form VC, and 40 of the form CVC. 
Also, each group contained approximately the same number 
of each of the fundamental speech sounds. To take cure of 
all of the possible syllable combinations required 174 such lists 
or 8700 syllables. When using such lists in a test any prissi- 
Die combination of the speech sounds may occur so there can 
be no tendency to memorize the syllables, such as hapi>ens if 
words or peculiar types of syllables are used. 

• syllables are written on cards which are shuffled each 

time before they are used so that the order in which they are -- 
pronoun^d is entirely haphazard. To illustrate the technic of 
articulation testing, 4 sample list is given in Table XXVJIl. 

In the fimt column the syllable is given in its phonetic form A 
key word showing how each syllable is pronounced is given in 


partilkrT/of signific-ince except as a name for this 




HODS OF MEASURING SPEECH SOUNDS 257 


d column. The syllables are pronounced into the 
the rate of one every three seconds. 

TABLE XXVIII 

Speech-sound Testing List. List No. i6o 


sound 

Key word 


Speech sound 

Key word 


ho(t) 

26 

gob 

go 4“ b 


hay 

27 

shol 

shoal 


wa(g) 

28 

ros 

rus(t) 


wi(th) 

29 

jod 

jo(g) 4- d 


vow 

30 

bok 

buck 


air 

31 

zik 

z 4 (d)ike 


e(bb) + z 

32 

bTch 

buy -h ch 


you + sh 

33 

kith 

ki(te) 4" th 


on 

34 


gui(de) 4- t 


(Did 

35 

yif 

y 4- if 

7 

jow(l) 4* V 

36 

sin 

sin 

ash 

mou(nd) + sh 

37 

t6rm 

term 

r 

r 4- our 

38 

mM 

m 4" earl 

k 

z 4* (s) 00 the 

39 

p^rv 

p -j- (n)erve 


who 4" s 

40 

yet 

y 4- eat 

sh 

ch + (p)ush 

41 

bel 

b 4- eel 

L 

j 4- (f)oo(t) 4 m 

42 

zef 

ze(al) + f 

P 

th + (s)oo(t) 4 p 

43 

weng 

whe(n) 4- ng 

1 

foo(t) + ch 

44 

kev 

k 4* ev(er) 


wa(ll) -h ng 

45 

hang 

hang 

th 

cha(lk) -h th 

46 

p% 

P + (r)ag 


ta(ll) 4- j 

47 

yas 

y 4“ ace 


k 4" aug(er) 

48 

diip 

d + ape 


(tele) phone 

49 

yang 

ya(cht) -h ng 


dose 

50 

Ian 

1 4“ on 


ble XXIX is given a sample of the records made by 
vers. This table gives the results obtained when the 
transmitted over a system transmitting only fre- 
as high as 1250 cycles per second. The correct word 
1 opposite the syllables which were recorded incor- 
rhe errors for each of the fundamental sounds were 
n from this original sheet and recorded on an analysis 



258 


SPEECH AND HEARING 


sheet as shown in Table XXX. This table gives the average 
results of eight observers and two callers for the same system. 
As seen from Table XXX the average of the eight observers 


TABLE XXIX 


DATE 

TEST No. 
IIST No.. 




TRANSMISSION BRANCH 
ARTICULATION TEST RECORDING SHEET 

TITLE OF TES T 
CONDITION 

Aifernuafto 


s- Jer. <*<viaBSBRV*R 




riNDEXlol 

L - W OED — 1 

articulation 

A-0 % 









METHODS OF MEASURING SPEECH SOUNDS 


259 

and the two callers gives 41.2 per cent for the syllable articu- 
lation. The letter articulation was 72.2 per cent. The con- 
sonant articulation was 65.8 per cent and the vowel articula- 
tion 83.4 per cent. 

It has been found that when using these lists for testing 
systems which have an articulation of the order of 70 per cent^ 
the probable error in the per cent articulation of an observation 
varies from dt 4 to ± 7 depending upon the crew (observer and 
caller). An observation used In this sense means the per cent 
articulation obtained by one crew using fifty of the syllables 
or one list. It has been a common practice to use ten lists 
instead of one^ which reduces this error to approximately d= 2 
per cent. If different callers and observers are used for each 
of the ten lists, it will be found that the probable error com- 
puted from the observer’s chart will then be about twice the 
figure given above, namely, zb 4 per cent. In order to reduce 
the observational error to approximately i per cent, ten persons 
on the testing team are necessary. If each one of the ten calls 
to the remaining nine, an equivalent of ninety crews or ninety 
observations will result. If the average of these Is taken. It 
will have a probable error which is not more than i per cent. 
It is obvious that the observational error computed in this way 
does not Indicate the magnitude of systematic errors such as 
are due to memory, practice, familiarity with the circuit being 
tested, etc. The best method of obtaining articulation values 
which are comparable for several systems is to arrange the 
testing so that the team has the same practice for each system. 

As would be expected, the articulation for a given sound 
depends somewhat upon its position in the syllable, that is, 
upon the letter sound preceding and following it. An analysis 
of results obtained with the standard lists showed that 42 per 
cent of the errors were initial consonants and 58 per cent were 
final consonants. The vowel errors for initial and final vowels 
were the same. Some consonants were recognized more easily 
after certain vowels than after others. However, the influence 
of a certain vowel upon the consonants as a class is the same 
as another vowel, so statistically a letter sound can be regarded 


26 o 


SPEECH AND HEARING 


TABLE XXX 

Summ arij Sheet — Average Errors ahore Z 7 * 
TRANSMISSION BRANCH 
ARTICULATION TEST ANALYSIS SHEET 

.TITLE OF TEST 

CONDITION TESTE D Z,n M Pu s s ^ ~ Average .f 

DATE „ia.r.£^T-^P, Alfcnutafion “ S napters OBSERVER-^ A. 

list Non ^ do,r,» CALLER.S 2 


11 

Na 0/ 

SOUNDS RECORDED AS 


% ERROR 

Called 

a 

a 

a 

e 

e 

er 

1 

r 

0 

0 

o 

u 


ou 

a 
















13,1 

a 
















2.2 

a 


13.8 



4.3 




2.2 



3.8 




2t0 

e 



24 

2.3 






/o.o 






in. S' 

e 
















2.0 

/ 

er 


J.Z 

3fi.fi 

2.7 

44 





3.i 

3.B 





IfO s 

i 

i 













S.3 



a*4 

1 
















42 

0 


44 



Z4.Z 











y/.o 

Z 
















3.D 

0 










z.z 


! 




/o*2: 

u 








24./ 


4.8 






34*4 

V 


■ 





1 






2.8 



11.8 

■ 

1 

■ 

■ 

■ 


m 

■ 

■ 

■ 

m 

m 

■ 

■ 

■ 

■ 

S.(> 


Total number of sounds called — Letter Articulation 1^^ 

Total number of errors — Articulation— 

Vowel Artlculatioii 2Z,f 


as independent of the other sounds from a recognition stand- 
point, provided the following conditions are fulfilled in making 
up testing lists: 


METHODS OP MEASURING SPEECH SOUNDS 261 


TABLE XXX (Continued) 

Sumrytar/j Sh&&f — Ayererats Errors above S %. 

TRANSMISSION BRANCH 

ARTICULATION TEST ANALYSIS SHEET 
TITLE OF TES T Jl 

CONDITION TESTED. P / v , R (t.ss- E i t trr . - /z^-o oT 

DATE Alt<»nuat>On = SraP.ors 2 

LIST Nob. CALLERjz 2 

TEST Nob. 


SOUNDS 

SOUNDS RECORDED AS | 

1 

1 

1 

CAUJED 


ch 

3 

f 

.e 

h 

i 


1 

m 


ng 

- 

r 

s 

all 

at 


V 



z 

1 

1 


b 

■ 

■1 





1 




i 

i 

i 




■1 

■1 


■ 

■ 


4.C 


Zl.z. 

ih. 

s 

■1 




^.9 

m 




i 

I 

1 

i 

3.Z 

7.3 


yQQ 

■ 

■1 

■ 




Si.& 

d 

|h 

■1 



II e 


M 




1 

1 

i 

1 




i 

1 

i 

B 




str-z 

f 


■ 




3 3 





1 

1 

B 

B 

ZG.6 

11.7 

to.o 

■ 

B 

B 

1 


3.0 



S 


i 

Q! 




on 




B 

B 

■ 

■ 




B 

■ 

■ 

B 




Taa 

h 

■ 

i 


■ 

m 


■ 

■ 

i 



■ 

4 a 












13.3 

i 

1 

E 


i 

i 

m 

1 

1 

1 
















ze.3 

k 

I 

i 

g 

1 

i 

■ 

I 

I 

I 




3.B 





33.7 







4a 0 

1 

1 

i 

■ 

1 

■ 

■ 

B 

B 

1 
















3,0 

m 


E 

■ 

E 

■ 

■ 

■ 

■ 

B 


2 9 














ii8 

n 

I 

■ 

■ 


■ 

■ 


■ 

■ 

31.7 









B 



B 

■ 



m 





B 

1 



B 










1 


' 

1 

B 

■ 


P 

H 

j 0 




■ 

B 











22 Z 

B 



1 

B 

i 

g 

r 

1 




■ 

■ 

1 












1 

B 


■ 

B 

1 

B 

ft 




LZ 


■ 

B 









ztf: 

24.') 

70 

E 

B 


E 

B 

B 


flh 




3.t> 

■ 

■ 

■ 










13.0 







■ 

M 

ill 


3 4 


O.k 


■ 









z\.\ 

3 S 



7.0 



ro.o 

3.1 

1 

H 

t 


7S 



1 



13 C 





m 











B 

g 

V 





1 

■ 

B 

m 

■ 

m 

m 

■ 

■ 




10.7 





ISI 




w 





B 


B 

E 

■ 

m 

E 

■ 














y 



■ 

■ 

■ 

■ 

■ 

i 

i 

m 

i 

■ 

1 


B 

B 

B 

IB 

B 


B 

■ 

B 

B 


z 

1 

1 

im 

■ 

■ 

■ 

IQgj 

E 

E 

■ 

B 

B 

B 

IB 

IB 

B 

IE 

IE 

D 

g 

IB 

IB 

IB 

IB 

H 


Na. of timoa mh HOund In died Articulation 

Total number of aounda called 

Total number of errora Word Articulation 4/-g 

^^Bonant Articulation 


1. There shall be no context between sounds; that is, woi-ds 

and sentences in general are excluded. 

2. The letter sounds shall occur approximately the same 

number of times. 













262 


SPEECH AND HEARING 


3. The majority of possible letter combinations into the 

simple syllable forms shall be used. 

4. The number of initial consonants must be approxi- 

mately the same as the final consonants. 

When these conditions are fulfilled, it is possible to calculate 
the syllable articulation from the consonant and vowel articu- 
lation. 

When it is inconvenient to train the observers so that they 
will both pronounce and also write without error the phonetic 
syllables, then to determine the vowel and consonant articula- 
tion use may be made of simple word lists. The words in these 
lists must be chosen so that the memory effects are reduced to 
a minimum. The lists shown In Table XXXI were constructed 
to fulfill these conditions. In the vowel list the vowel is placed 
either between the two consonants b and t, or between b and k. 
With such lists the vowel must be interpreted correctly before 
the word can be identified. Similarly, in the second list the 
consonant is either followed by the vowel I or preceded by wi. 
This makes it impossible to identify the word unless the con- 
sonant is heard. When using these lists the words are written 
on cards which are shuffled so that the order is different In suc- 
cessive tests. The word is marked right or wrong on the basis 
of the single letter sound and consequently the word ‘^articula- 
tion^’ in this case becomes “letter articulation.” 

In research work which requires a large amount of testing, 
the standard syllabic lists are preferable for two reasons. They 
enable observers to obtain data at a faster rate, for it is readily 
seen that in pronouncing 100 syllables from the standard lists, 
280 letter sounds are pronounced, while in pronouncing words 
from the word list only 100 letter sounds are pronounced, which 
are counted in the test. Therefore, considerably more time is 
required to obtain the same accuracy with the word lists as with 
the standard syllabic lists. Also the standard lists place each 
letter sound adjacent to another letter sound In a manner more 
nearly like that occurring in ordinary speech than obtained 
with the word lists. 



METHODS OF MEASURING SPEECH SOUNDS 263 


TABLE XXXI 


Vowel Word List (English Words) 


Vowel 

Sound 

English Words in the List 


a 

bat 

bat 

back 

back 



a 

bait 

bait 

bake 

bake 

bake 

4 

r 

e 

bet 

bet 

beck 

beck 

3 

e 

beat 

beat 

beak 

beak 


4 

i 

bit 

bit 

bit 

bit 


4 

I 

bite 

bite 

bite 

bike 

bike 

4 

5 

0 

but 

but 

buck 

buck 

6 

bought 

bought 

balk 

balk 


4 

0 

boat 

boat 

boat 

boat 


4 

u 

book 

book 

book 

book 


4 

A. 

u 

boot 

boot 

boot 

boot 



ou 

bout 

bout 

bout 

bout 


4 


1 1 

Total number of words ir 

1 list 



4 

cn 


1 

1 

1 

1 




The vowel articulation is the percentage of the vowel sounds correctly perceived. 




264 


SPEECH AND HEARING 


When only a limited amount of testing is done and that 
with a personnel who are untrained in writing the phonetic 
sounds, the word lists are very useful. This is particularly 
true when testing the relative quality of sets for aiding the 
deafened, for in these instances it is desirable that the observers 
be deafened persons and consequently they are usually un- 
trained in observing and unfamiliar with phonetic markings. 

As stated above, the intelligibility of a system transmitting 
speech is defined as the per cent of ideas expressed in the form 
of simple test sentences which, after transmission, are correctly 
understood by a number of observers. It is probable that the 
intelligibility is more directly related to the thing which it is 
desired to measure than is articulation. However, it is a much 
more difficult quantity to measure. It is evident that due to 
memory effects, a set of sentences can be used with the same 
personnel only a very few times. Also, the psychological fac- 
tors become more prominent when using sentences than when 
using simple syllables. 

However, a set of simple interrogative and imperative sen- 
tences was compiled so as to obtain the approximate relation 
between the articulation as determined by the standard lists 
and the intelligibility as determined by the simple test sen- 
tences. In making up these test sentences the list was designed 
to test the observer’s acuteness of perception and to minimize 
demands upon his intelligence. The questions are of a self- 
evident nature, the answers being frequently implied in the 
questions. They vary in length from about five to twelve or 
more words, each sentence containing four or five “thought” 
words. These “thought” words must be correctly received in 
order to understand the idea of the sentence. Various topics 
covered by ordinary conversation are represented, including 
personal experiences and points of interest in politics, science, 
and commerce. An effort was made to eliminate duplicate 
ideas. In only a few instances were the ideas repeated and in 
those cases the manner of expressing them was varied. In 
this manner forty-nine lists of fifty sentences each were com- 
piled. A sample of one of these lists is shown below. 



METHODS OF MEASURING SPEECH SOUNDS 265 


INTELLIGIBILITY LIST 
List i 

1. Name a prominent millionaire of the country. 

2. How large is the sun compared with the earth? 

3. Why are flagpoles surmounted by lightning rods? 

4. Give the abbreviations for January and February. 

5. Name the tree on which bananas grow. 

6. How often does the century plant bloom? 

7. What description can you give of the bottom of the ocean? 

8. Explain the difference between a hill and a mountain. 

9. What is the chief purpose of industrial strikes? 

10. Describe the shoes of the native Hollander. 

11. Name some uses to which electricity is put. 

12. What would cause the air to escape from a bicycle tire? 

13. Where is more grain raised, in the East or the West? 

14. Tell what is meant by an Indian Reservation. 

15. For what invention is Thomas Edison noted? 

16. Name a state which has no seacoast. 

17. Write the Roman numeral ten. 

18. Explain the difference between export and import. 

19. Explain why a corked bottle floats. 

20. What substance is a good conductor of electricity? 

21. Explain why Indians were afraid of firearms. 

22. Explain the purpose of fire drills. 

23. At what time do ocean waves become dangerous? 

24. What medicine would you take to remedy indigestion ? 

25. What knowledge is covered by the study of astronomy? 

26. Name a good restaurant in this vicinity. 

27. What is the importance of large windows in stores? 

28. Explain why a giraffe eats the foliage of trees. 

29. How are the pages of a magazine held together? 

30. Explain why the name string-bean is appropriate. 

31. Name a nearby city in which there is a shipyard. 

32. Name a fruit which grows in bunches. 

33. Which of our Presidents went to South Africa? 

34. Why are wire springs used in beds ? 

35. Why are books bound in stiff covers? 

36. Why did the home people conserve food during the war? 

37. Name an insect that has a hard shell. 

38. What symbol on the United States money stands for liberty? 

39. What weapons did the Indians use in warfare? 

40. In what kind of weather does milk sour? 



266 


SPEECH AND HEARING 


41. What streets in this city have Dutch names? 

42. How does turning a ship^s wheel steer the ship? 

43. What nation aided us in the Revolutionary War? 

44. What are some personal characteristics of the people of Japan! 

45. What candy is black and good for colds? 

46. Name a famous Indian Tribe. 

47. Why is this building lighted by reflected light ? 

48. Why are most lighthouses situated on rocks? 

49. Give some ingredients used in soap. 

50. Why is a house built of stone superior to others? 



Fig. 124. — Relation- between Distortion and Recognition. 

In order to obtain an experimental relationship between 
the intelligibility and the syllable articulation, tests were made 
on eight transmission systems which gave syllable articulations 
varying from 5 per cent to 98 per cent. Six observers were 
used in the work and 1800 sentences were used to determine 
the intelligibility for each condition. Similarly, 1800 syllables 
were lised to determine the articulation for each condition. 
The results obtained from these tests are shown in Fig. 124. 
In this figure distortion is taken as 100 minus the syllable 


METHODS OF MEASURING SPEECH SOUNDS 267 

articulation as obtained by the standard lists. This definition 
is arbitrary, but its choice helps to bring out the relationships 
existing between the various ways of measuring the recognition 
of distorted speech. The papers of the observers were cor- 
rected not only on the basis of syllables, but also on the basis 
of the letter sounds so that the letter articulation was obtained. 
This is also shown in the figure. After a testing crew has had 
some practice at listening to sentences and syllables, it becomes 
more efficient in recognizing the speech sounds. However, the 
data indicated that within the observational error the Improve- 
ment is such that the points will still remain on the curve; 
that is, if the articulation shows an improvement of from 5 to 
10 per cent the Intelligibility will show an improvement of 
from 20 to 38 per cent as shown by the curve. It will be seen 
that for distortions greater than 80 per cent a change of 10 
per cent distortion is equivalent to a change of approximately 
40 per cent in the intelligibility, while an equal distortion 
change for distortions below 20 per cent corresponds to less 
than I per cent change in the intelligibility. For this reason 
these test sentences are useful for testing systems having very 
large distortions but are of little value for testing ordinary 
transmission systems. 

In order to obtain a notion of how the time of transmitting 
an idea correctly over the system will vary with various 
amounts of distortion, the caller asked a question and the 
observer was instructed to reply orally. A record of the time 
was kept from the instant the caller began his sentence until 
a satisfactory reply had been received. Both caller and 
observer had been previously Instructed to carry on a con- 
versation over the system as they ordinarily would. The 
observer could ask the caller to have the sentence repeated, 
reworded, or any of its difficult words spelled until the caller 
was satisfied that his question had been understood. The 
ratio of the time required on the high quality system to that 
required on any other system Is called for convenience the 
conversational efficiency of the latter system. Figure 125 
shows the type of results thus obtained. It is seen that a 



268 


SPEECH AND HEARING 


system which is to transmit correctly at least 9 out of 10 test 
sentences must have at least 36 per cent syllable articulation. 
Such a system will have a conversational efficiency of 70 per 
cent, that is, only 70 per cent as many test sentences can be 
transmitted over such a system and be correctly understood 
in a given time as would be oossible with an ideal system. 



Fig. 125 — Relation between Let'fer and Syllable Articulation, 


/iHiculation and the Theory of Probability 

Since the articulation is the number of successful trials out 
of 100 attempts at guessing the correct letter or syllable, it can 
be considered as a probability. If C, V, L, and S are the con- 
sonant, the vowel, the letter, and the syllable articulations, then 
C V L S . 

■ and give the chance that a consonant, a 

100 100 100 100 

vowel, a. letter sound, or a syllable will be recorded correctly. 

In the standard lists the chance of recording a syllable of 
the form consonant-vowel or vowel-consonant correctly is 
obviously then CV X lo-^. Similarly, the chance of recording 



ARTICULATION AND THEORY OF PROBABILITY ^169 

correctly the syllable of the form consonant-vowel-consonant 
is CVC X 10 Since there are 20 of the first and 80 of 
the second type of syllables in each 100 syllables, in the stand- 
ard lists the number of correctly recorded syllables when a 
list of 100 is pronounced is 20 CV X io~^ -|- 80 CVC X lo"® 
or the syllable articulation So is given by 

So = 20 CV X io~^ -f- 80 CVC X 10-® (i) 

If we are interested in the letter articulation L, then L can be 
substituted for both C and V and the above formula becomes 

So = 2L2 X 10-® + 8L® X 10-® (2) 

The validity of this procedure is dependent upon the conditions 
outlined on page 249 being fulfilled. For most of the lists of 
meaningless syllables which have been proposed for use it is 
justified. For a list which is made up so as to have syllable 
forms occurring with the same frequency as in written speech 
a different formula will relate L and S. 

These relations have been confirmed by a large amount of 
experimental data. In Fig. 125 is shown a curve between 
L and So as calculated from equation (2). The dots represent 
the experimental results. In these results the letter articula- 
tion was obtained from the same data as the syllable articula- 
tion, obtained with standard lists as explained in the beginning 
of this chapter. 

1 This relation was first pointed out by J. Q, Stewart. 



CHAPTER IV 


Effect of Changes in the Received Intensity of Speech 
Sounds upon Their Recognition 

In order to study the effect of changes in intensity upon the 
recognition of speech it is necessary to obtain the speech 
sounds at varying degrees of intensity. One means of doing 
this is to vary the distance between the speaker and the 
listener. There are various objections^ however^ to using this 
method. As pointed out in Part Three, Chapter VI, it is im- 
possible to secure the range of intensities required by varying 
the distance because of the large distance required. It would 
require distances larger than looo feet to reduce the intensity 
of the average voice to the threshold of audibility. Under such 
conditions, it is very difficult to control the interfering noises 
and also the reflections which produce distortion. 

Consequently, a telephone system was constructed which 
reproduced speech with practically no distortion. It was 
arranged so that by means of distortionless attenuators, the 
intensity of the reproduced sounds could be varied through a 
very wide range. A schematic of this telephone system is 
shown in Fig. 126. As indicated, its essential elements are a 
condenser transmitter to receive the speech waves and trans- 
form. them into the electrical form, an amplifier for magnifying 
the intensity of the electrical speech currents, an attenuator 
for controlling the intensity, an equalizing network, and a 
receiver for delivering the speech to the ear. The attenuator 
consists of a system of electrical resistances arranged so that 
the amplitude of the speech waves could be reduced in steps 
to as^ low as one-millionth of their maximum values. The 
equalizing network is an arrangement of resistances, condensers. 



EFFECT OF CHANGES IN SPEECH SOUNDS 171 


and inductance coils^ having a frequency selectivity which is 
the complement of that of the rest of the system. In other 
words^ its introduction into the system compensated for the 
lack of even response in other parts of the system^ particularly 
in the receiver. 

Articulation tests were made with the standard lists described 
in Chapter Ill-using this high quality system when it was set 
to deliver various intensities from the threshold of audibility 
to very high values. In these tests two callers and eight 
observers were used and five lists called by each caller. Con- 
sequently, each of the points represents the results obtained 
from the records of 8000 syllables. The results are shown 


2MF 20/ 





'*0" BATTERY 
210 VOLTS 


Fig. 126. — High Quality Telephone System. 


by the curve in Fig. 127. The differently shaped points corre- 
spond to data taken with different teams having different train- 
ing. These data were taken at times separated by intervals 
as much as a year. The abscissas give the intensity level of 
average speech. As shown in Part Three, Chapter III, the zero 
on this scale is the intensity level existing at the ear when a 
speaker talks with average conversational intensity with his 
lips I inch away from the ear. It is also the intensity level 
corresponding to a flow of speech energy in a free wave of 
one microwatt per square centimeter. As the various speech 
sounds are pronounced during_a conversation, their intensities 



272 


SPEECH AND HEARING 


fluctuate about this average level. It is seen that the curve 
strikes the intensity axis at an intensity level of — 100 db. 
This point also- corresponds to the threshold of audibility for 
average speech. This threshold is determined by the loudest 


SENSATION level 

0 _ 10 20 30 40 50 60 70 60 90 100 ||0 120 



Fig. 127. ARTiotTLATioNT vs. Intensity or Received Speech. 


vowel sound, namely, the sound 6 (awl). For convenience a 
a scale of sensation level is also given at the top of Fiv 127' 

at level for beat interpretation is 

at 30 or at a sensation level of 70 db. As the sensation 

frl^ Tn ° “ “ttieulation rises rap dly 

9 ° P'=^ From <0 db to 

JatlSht dee^ ^b 

r . decrease. It is interesting to note that for a 
Z TcnLt r hold only 

.be ta„sity w : Tb redr 

When reproduced speech is decreased to an intensity value 




EFFECTS OF CHANGES IN SPEECH SOUNDS 273 


near the threshold of audibility^ the loudest components are 
in the frequency range from 700 to 1500 cycles (see Fig. 82). 
For this frequency range the threshold intensity level ao is — 93. 
The reason why speech can be reduced to an intensity level 
of — 100 before it becomes inaudible is because the loudest 
speech sounds have a level which is 6 to 10 db greater than 
the average level. 

The articulation data were analyzed to determine the articu- 
lation for each of the fundamental sounds. The curves shown 
in Figs. I28 j 129^ 130^ 131^ 132, and 133 give the results for 
each of the fundamental sounds. The abscissas are given in 
terms of the intensity level for average speech rather than for 
the speech sound itself. The threshold intensity level for each 
of the speech sounds as determined by the method described in 


/ 

-- 







n 


r 


-- 











- 


- 




r 

- 

- 

- 

- 

- 

- 

- 

J 

OU 


1 

- 



- 

- 

~ 

L_ 

- 


- 

- 

J 

- 

- 

- 


- 

- 


TTTT 













* 

T 




































_J 



























r 



“"i 

"T 










1 







I] 




□ 

□ 







c 

□ 


1 “ 








100 

80 

60 

40 

20 


-loo-ao "60 -40 -20 0 20 -100 -ao “60 -40 -20 Q 20 -100 -30 -60 -40 -20 0 20 

INTENSITY LEVEL INTENSITY LEVEL INTENSITY LEVEL 


0 ' 


/ 


- 

- 




- 

- 

- 

- 












































J 








e'r 
























Zi 


- 

- 



E 



.. 

J 

... 

-J 






-j_ 



- 

- 


-- 

EP 




























-- 



- 

- 

— 


-- 



-I a 



— 

— 


— 

jiZ 

±] 

i: 


- 

- 


-- 

— 

-- 


100 

80 

60 

40 

120 


-100-80 -60 ~40 -20 0 20 

INTENSITY LEVEL 


-100 “80 -60 -40 -20 0 20 -100 -80 -60 -40 -20 O 20 

INTENSITY, LEVEL INTENSITY' LEVEL 


Fig. 1 28. — Articulation vs. Intensity Level. 


Part One, Chapter IIIj is indicated by the small arrow. It was 
from these articulation data that the figures given in the last 
column of Table VIII were calculated. The sounds in each 
of the groups have similar characteristics from a recognition 
standpoint. 

In Table XXXII are shown the results of the articulation 
tests for the range of in tensities usually used in conversation. 




ARTICULATION X ARTICULATION ^ ARTICULATION ^ ARTlCUtATION 


274 


SPEECH AND HEARING 


As a check against these results obtained with the high-qualitf 
telephone circuit, tests were made with the observers stationed 




.00-80 -60 -40 -20 0 

INTENSITY LEVEL 


■ 20 -100 -80 -60 -40 -20 0 

INTENSITY LEVEL 


20 -iOO -80 -60 -40 -20 „ 

INTENSITY LEVEL 


Fig. 129. — Articulation vs. In'J’knsitv Level. 





-r 




r 




1 



i 

V 

i 

■“r" 

1 


i 

J,. 



• 


i 

sh 



’ I 

1 

i 

i 

f 


-100-80 -60 -40 -20 0 20 
INTENSITY LEVEL 


Fig. 130 , — Articulation vs. iNTicNsrrv Level. 


[100 

00 

60 

AQo- 

20 


-100 -80 -60 -40 -20 0 20 -100 -80 -60 -40 -20 0 20 

INTENSITY LEVEL INTENSITY LEVEL 


at 3 feet away from the speaker, thus permitting the speech 
sounds to be transmitted through the air. 'Fhese tests checked 


EFFECT OF CHANGES IN SPEECH SOUNDS 275 


the results given In Table XXXII within the observational 
error. Instead of the articulation^ 100 minus the articulation 





> 









' 














t 











J 













- 

- 

- 



- 


- 







3 



lI 























h 







L 

□ 




L, 







SOS 

is 

BSB 

III 


BO 

gg 

!BB 

|H 



BBS 

In 

II 

ill 

11 

III 

IBB 


-100 -ao -60 -40 -20 0 20 -(00 -80 -60 -40 -20 0 

INTENSITY LEVEL INTENSITY LEVEL 


t 




100 

80 

60 

40 


20 -100 -80 -60 -40 -20 < 

INTENSITY LEVEL 


Fig. 13 1. — Articulation- vs. Intensity Level. 



-so -60 -40 -20 0 20 -100 “80 -60 -40 -20 O 20 

INTENSITY LEVEL' INTENSITY LEVEL 



INTENSITY LEVEL 



Fig. 13a. — A rticulation vs. Intensity Level. 


or the articulation error is given. The speech sounds are 
arranged according to the magnitude of the articulation error 


276 


SPEECH AND HEARING 


and^ consequently^ according to the relative difficulty of recog- 
nizing them. It will be noticed that the consonants are usually 
harder to recognize correctly than the vowels. However^ the 
speech sounds e and 1, r^ ng form notable exceptions to this 
rule since the former is among the most difficulty while the 
latter are among the very easiest speech sounds to recognize 
at normal intensities. At all intensitieSy the sounds thy fy and 
V are the most difficult to recognize. The sound z, which is 
readily recognized at normal intensitieSy becomes very difficult 
at weak intensities. The sounds iy oUy ery and 6 are missed 



Fig. 133. — Articulation vs. Intensity Level. 


less than 10 per cent of the time even when very near the 
threshold value for average speech. It is seen from Table 
XXXII that for intensities commonly used in conversation 
the sounds Vy fy and th count for more than half of the mistakes 
in the recognition of the fundamental speech sounds. 

There is a characteristic difference between the shape of the 
curves for the vowels and for the consonants. For the former 
the curves run along horizontally and then drop off very 
abruptly. For the latter the drop is more gradual. The 
threshold points marked on the axis of abscissas do not corre- 




EFFECT OF CHANGES IN SPEECH SOUNDS 277 


spond in general to zero articulation. A consonant sound may 
sometimes be identified by the modification produced on the 
following or preceding vowel even though it is below the 
threshold as determined by an isolated sound. It might seem 
logical to consider this modification of the vowel as part of the 
consonant. If it is so considered^ then it is evident that as 
long as the vowel is heard there is always a chance of identifying 


TABLE XXXII 

Articulation Error or the Per Cent of Times the Sound is Misinterpreted 


Speech 

Sound 

Key Word 

Average 

Speech 

Sound 

Key Word 

Average 

1 

look 

0.2 

g 


1 .0 

I 

time 

0.2 

a 

top 

I . I 

ou 

town 

0.3 

b 

bail 

I . I 

ng 

sing 

0.3 

n 

no 

I . I 

r 

red 

0-3 

e 

team 

1 .2 

z 

zest 

0-3 

h 

hat 

1 .2 

er 

term 

0.4 

sh 

ship 

i-S 

y 

you 

0.4 

a 

tap 

1-5 

5 

tone 

o-S 

ch 

cheap 

1.8 

d 

day 

0.5 

s 

say 

1.8 

i 

tip 

0.6 

k 

keep 

1.8 

t 

ten 

0.6 

u 

tool 

2.2 

m 

man 

0.7 

u 

took 

2-5 

j 

jump 

0.8 

p 

pay 

2.5 

0 

ton 

0.8 

e 

ten 

2.8 

6 

talk 

0.8 

V 

view 

3-9 

w 

we 

0.9 

f 

hill 

12.7 

a 

take 

1 ,0 

th 

then 

17-3 


the consonant preceding or succeeding it, and consequently 
the threshold of a consonant so considered will be the same 
as that for the vowel. It is for this reason that all of the curves 
seem to go through the same zero articulation point. For 
example, it is seen that, for the sounds in Figs. 133 and 134, 
the articulation is still above zero when the characteristic part 
of the consonant is 10 or 15 db below the threshold for the 



SPEECH AND IIF.ARING 


278 

isolated sound. The vowels shoiiki have zero articulation 
points corresponding to their respective threshold values. 
The curves do not extend far enough to verify this fact, 'f'he 
only vowel curve which does not seem to satisfy this condition 
is that for the sound e. However, 'Fable X shows that the 
average phonetic power of this sound is about one-third that 
of the sound 6 and consecpiently its threshfdd value should 
be only about 5 db different from that of b instead of 10 db. 
The former value agrees with the articulation curve. 



CHAPTER V 


Effect of Frequency Distortion upon the Recognition 
OF Speech Sounds 

When pure tones having equal intensities are produced 
successively in front of the transmitter a transmission system 
is said to have frequency distortion if unequal intensities are 
produced at the receiving end. It is evident, however, that 
if this lack of uniform response to different frequencies exists 
for those frequencies only which are either below or above the 
hearing range, no distortion will be noticeable to the ear. 

To determine the importance of the various frequencies for 
carrying the properties which determine the recognition of 
sound, the following experimental tests were performed. By 
means of the high-quality telephone system illustrated in Fig. 

1 26, speech sounds were converted into electrical waves. These 
electrical waves were sent through electrical filters which had 
the property of transmitting only certain frequency ranges. 
One type of filter known as the “low pass” filter transmits 
only frequencies below a certain limit, a schematic diagram of 
the circuit arrangement to produce this effect being shown in. 
P'ig. 134. The other type of filter known as the “high pass” 
filter passes only those frequencies above a certain limit, the 
circuit arrangement for producing this effect being shown in 
Fig. 135. By means of the arrangement of coils and con- 
densers shown in these two figures, the amplitudes of those 
frequencies outside of the band which we desire to transmit 
are reduced to less than xirW of their normal value while those 
in the band are only slightly changed.’- By means of a switch- 

^ I'or a com}')lete theory of this action of electrical filters, see Chapter XVI of book 
by K. S. John.son entitled “Transmission Circuits for Telephone Cornmunication»” 

m 


28o 


SPEECH AND HEARING 


ing mechanism various values can be given to the coils and 
condensers so as to make the limiting frequency at any desired 
value. 


. ./ipnrgnD — — Gnrwirir'— 


, — Gnrinnrrw---H 


f f ^ 

- “ 

r; r: 

“ f F 


Fig. 134. — Low Pass Filter. 


Articulation tests were made, using various filter combina- 
tions. The solid curves in Fig. 136 show the results of these 
tests. The ordinates give the syllable articulation and the 



abscissas give the cut-ofF or limiting frequency of the filter. 
For example, on the curve labelled “Articulation L” the point 
(1000,40) means that a system which transmits only frequen- 



Fig. 136. — Effect upon the Articulation and Energy op Speech of Eliminating 
Certain Frequency Regions. 


cies below 1000 cycles per second has a syllable articulation 
of 46 per cent. Similarly, on the curve labelled “Articulation 




EFFECT OF FREQUENCY DISTORTION 


2gl 

H” the point (1000,86) means that a system which transmits 
only frequencies above 1000 cycles per second has a syllable 
articulation of 86 per cent. The dotted curves in this figure 
show the per cent of the total speech energy which is trans- 
mitted through such filter systems. These curves were derived 
from the data given in Part One, Chapter III. In obtaining 
these results the intensity of the received speech was adjusted 
by means of an attenuator in the circuit so that a maximum 
articulation was obtained for each filter system. 

It will be seen that although the fundamental chord tones 
with their first few harmonics carry a large portion of the 
speech energy and are important from the standpoint of the 
naturalness of the reproduced speech, they carry practically 
none of the properties which determine the correctness with 
which the speech sounds are understood. A filter system which 
eliminates all frequencies below 500 cycles per second elimi- 
nates 60 per cent of the energy in speech, but only reduces 
the articulation 2 per cent. A system which eliminates fre- 
quencies above 1500 cycles per second eliminates only 10 per 
cent of the speech energy, but reduces the articulation 35 per 
cent. A system which eliminates all frequencies above 3000 
cycles per second has as low a value for the articulation as one 
which eliminates all frequencies below 1000 cycles per second. 
This last statement may appear rather astonishing since it is 
contrary to the popular notion of the relative importance of 
various voice frequencies. In this connection it should be 
pointed out that the articulation is not a measure of the 
naturalness of the reproduced speech. Although a system 
transmitting only those frequencies above 1000 cycles will give 
an articulation of 85 per cent, a degree of recognition that for 
many purposes might be satisfactory, the speech reproduced 
by it will sound very peculiar, its naturalness being destroyed. 
As indicated, the elimination of frequencies below 500 cycles 
produces only a small effect upon the articulation but it pro- 
duces a much larger effect upon the naturalness. 

The two solid curves intersect on the 1550 cycle abscissa 
and at 65 per cent articulation, which shows that using only 


SPEECH AND HEARING 


282, 

frequencies above or frequencies below J55^ cycles an articula- 
tion of 65 per cent will be obtained. The two dotted curves 
necessarily intersect at 50 per cent. 

The data were analyzed to find the effect upon each of the 
fundamental sounds. The results of this analysis are shown 
in Figs. 137, 138, 139, 140, 14I, and 14a. These curves 



CUT OFF FREQUENCY CUT OFF FREOUENCY CUT OFF FREQUENCY 



CUT OFF FREQUENCY CUT OFF FREQUENCY CUT OFF FREQUENCY 



CUT OFF FREQUENCY CUT OFF FREQUENCY 

Fig. 137. — ^Long Vowels, 


indicate that some of the sounds are fairly well localized in a 
limited frequency range while others seem to have'charactcri.s- 
tics extending throughout the entire range. For example, the 
sound e could be recognized correctly 98 per cent of the time 
when either the range of frequencies above 1700 cycles or the 
range of those below 1700 cycles was used. On the other hand. 




EFFECT OF FREQUENCY DISTORTION 283 


the sound “s'’ was only slightly affected by eliminating fre- 
quencies below 1500 cycles but its characteristics were practi- 
cally destroyed by eliminating frequencies above 4000 cycles. 
The short vowels, u, o, and e, are seen to have important cliar- 



0 12 3 4- 5xl0^ 012 3 4 5x10^ 

CUT OFF FREQUENCY CUT OFF FREQUENCY 


Fig, 138. — ^Short Vowels. 



cur OFF FREQUENCY CUT OFF FREQUENCY CUT OFF FREQUENCY 



CUT OFF FREQUENCY CUT OFF FREQUENCY CUT OFF FREQUENCY 


Fig. 1.39. — Stops. 




284 


SPEECH AND HEARING 


acteristics carried by frequencies below 1000. More than a 20 
per cent error is made in recognizing these three sounds when 
the frequency components below 1000 are eliminated. On the 





CUT OFF FREQUENCY 



n 



□ 

p 

r 

□ 

□ 


— 

'100 

— 

r 



— 



j=? 


=5; 

, 100 





c 


F 

F 

F 


100 




r- 



t 

□ 

□ 

t 






y 










A 


L 

t 

t 

t 

t 



0 







' P, 

AS 

fs 







loLOW PASS 




< 

n 


oLOW PASS 



h 


"T 



•HIGH PASS i 


60 



A 


|•HIGH 

PASS 


60 ^ 





•HIGH PASS 



< 


/ 

~ 




□ 

r 




/ 





i 

L 




t 












L_ 





□ 

t 



40 


L 





□ 

□ 



40 


L 



u 

L 





40 

0 


1 






li 
















r 

t 





)- 







"6 














20 










20 

< 







□ 











□ 

1 









j 










i 

□ 




D 







□ 




0 







J 




0 

< 

5 1 

1 2 2 

J 

4 

5*103 b 1 

2 


4 

5 «i 05 0 1 

2 


\ 4 

5*103 


CUT OFF FREQUENCY CUT OFF FREQUENCY CUT OFF FREQUENCY 


Fig. 140. — Semi-Vowels. 


Other hand the elimination of frequencies above 2000 cycles 
for these, sounds produces only slight effects. The long vowels 
and the diphthong sounds seem to have sufficient distinguishing 
characteristics in either half of the frequency range to be idcn- 





Fig. 141. — Transitionals, 


tified. The Intersection point of the curve for these sounds is 
always above 90 per cent, showing that by using a frequency 
range on either side of the Intersection point, the sounds can be 




EFFECT OF FREQUENCY DISTORTION 


285 


readily identified. The fricative sounds are seriously affected 
by the elimination of the high frequencies. The elimination 
of frequencies above 3000 reduces the articulation of the sound 
“s'’ to 40 per cent^ the sound ‘Qh” to 66 per cent^ the sound 
“z” to 80 per cent, the sound ""'t” to 81 per cent, and the 
sound “f" to 85 per cent. All other sounds are reduced less 


too 
80 
60 
[4 0 

1 20 
0 

0 12 3 4 5>tlO^ 

CUT OFF FREQUENCY 



■6=: 







-c 














A 


0 LOW PASS 
•HIGH PASS 

- 
















t 















j 



























_ 






=5= 








3 















fa— 


L_ 















2 



oLOW PASS 
•HIGH PASS 

- 

1 










z 



























_ 

_ 




.100 

80 

60 

40 

20 



__ 


A 





- 

=1 





oLOW PASS 
• HIGH PASS 

- 



L 



4 










K 










/ 





V 



























_ 

_ 




100 

00 

60 

40 

20 


0 12 3 4 5x10^ 

CUT OFF FREQUENCY 


0 12 3 4- 5x10^ 

CUT OFF FREaUENCY 














r 




A 



0 






A 










r 









t 

oLOW PASS 




J 


• HIGH PASS 




£ 




f 





f— 







t 









LE 




_ 

_ 

_ 




0 1 2 3 4 5: 

CUT OFF FREQUENCY 


100 

80 

60 

40 

20 

0^ 


-T" 



p.— 

J 


“ 

















1 

OLOW PASS 
•HIGH PASS 








7 









' 















sh 




T 























_ 

_ 




60 
40 
20 
0. 

0 12 3 4 5K10* 

CUT OFF FREQUENCY 


100 
60 
60 
40 
20 

0 12 3 4 5«10^ 

CUT OFF FREQUENCY 





1 nr 






oLOW PASS 
•wtr.w PASS 


=3 


_ 


— 

— 

A 



L 

— 





a 








, ! 










2- 

1 



t 

L 









n 









□ 

r 









□ 

t 




Fig. 142. — Fricatives, 


than 10 per cent by the elimination of this frequency range. 
The pure vowels, the diphthongs, and the semi-vowels are 
affected only a negligible amount by the elimination of this 
region. The curves indicate that for the unvoiced stop conso- 
nants the frequencies in the region of looo and 3000 cycles are 
the important ones for carrying the recognition properties. 




286 


SPEECH AND HEARING 


The sound “t” has a noticeable characteristic, namely, that 
the elimination of all sounds below 1500 cycles produces no 
noticeable effect upon its recognition, ft is also the first one 
of this group to be affected by the elimination of the high 
frequencies. The transitional sounds w and h seem to have 
important characteristics in the frequency region between 700 
and 2000 cycles. The sound y has characteristics simiLu to the 
sounds I and e. When all frequencies below 1500 cycles are 
eliminated it still has an articulation of 99 per cent. 

It must be remembered that in spite of the fact that high 
articulation values are obtained for these sounds under the 
distorting conditions mentioned, the quality of the .sound is 
materially altered, that is, the naturalness is considerably 
reduced. However, there are some characteristics of the sounti 
which seem to be sufficient to identify it in spite of its greatly 
altered quality. For example, e .sounds very much like u when 
frequencies above 1000 cycles are eliminated, but from the fact 
that the per cent articulation at this point is over 90 per cent 
it is evident that some features are still preserved in the low- 
frequency region for the sound e that distinguish it from the 
sound u. 

Tests made with women calling vs. men calling over these 
filtering systems indicated that the fricative sounds formed by 
women’s voices require very much higher frequency ranges to 
properly transmit them than those produced by men. 'I'he 
elimination of frequencies above 4400 cycles reduces the articu- 
lation for women’s voices for the .sound “s” to 6H, for the sound 
“th” to 56, and for the sound “f” to 88, while for men’s voices 
these values are 95, 75, and 97, respectively. 

In most systems proposed for use certain frequency regions 
are not entirely eliminated as in the filter systems but only 
suppressed various amounts. In order to increase the efficiency 
of the system from a loudness standpoint, re.sonance is fre- 
quently introduced. This is accomplished by adjusting the 
mass reactions and elastic constraints in the mechanical part.s 
or the inductances and capacities in the electrical parts .so that 
they annul each other at certain frequencies and thus permit 


EFFECT OF FREQUENCY DISTORTION 


287 

large amplitudes for small driving forces. This results in 
magnifying the amplitudes in certain frequency regions above 
those in other regions. 

The effect of such resonance on the ability of the system 
to properly transmit speech was investigated by means of the 
circuit shown in Fig. 126. An electrical network consisting of 
an inductance coil and a condenser connected in parallel was 
bridged across the circuit. By adjusting the inductance and 
capacity of these to the proper values^ the desired resonant 
characteristic was given to the system. For illustration, the 



Kic. 14 ,-?.— Freiwemcv Characteristics of Resonant Systems. 


characteristics of three of the systems used in the articulation 
tests are given in Fig. 143. System No. i has a resonant fre- 
quency at 1100 cycles and a damping constant of 450 bels per 
second; system No. 2 a resonant frequency at 1100 cycles and 
a damping constant of 35 bels per second; system o. 3 a 
rc.sonant frequency at 2000 cycles and a damping constant ot 
220 bels per second. The ordinates represent the ift^^ity 
level at which pure tones having the pitch represented by the 
abscissas will emerge from the receiver when they are created 
with a zero intensity level at the transmitter. ince t ey 


288 


SPEECH AND HEARING 


represent the necessary amplification to make the reproduced 
tones equal in intensity to the original tones, it is convenient to 
speak of these ordinates as representing the number of db 
below unity reproduction. Curves are drawn for the case 
when the attenuators in the amplifiers are set so that the 
resonant frequencies in each case are at the same level, namely, 
lo db below unity reproduction. The frequency characteris- 
tics of singly-resonant systems such as these are represented 
by curves which are symmetrical about the resonant fre- 
quency when the coordinates such as shown are used, namely, 
pitch for the abscissas and db below unity reproduction for 
the ordinates. (See Appendix F.) 

System No. 2 was the most sharply resonant of any of the 
systems used in the test, yet it would be called very highly 
damped compared to a system like a tuning fork or a piano 
string. It was seen in Part Three, Chapter VI, that all of the 
tuning forks used for clinical purposes have damping constants 
smaller than 3 db per second. System No. 2, then, damped 
out its natural frequency of vibration one hundred times faster 
than the most highly damped tuning fork. Even so, it pro- 
duces a serious impairment to the speech. However, this 
impairment is mainly due to the unequal response for differ- 
ent frequencies rather than to “hangover effects” due to 
transients. 

A series of articulation tests were made with systems hav- 
ing resonant frequencies at 1100 cycles but with various 
amounts of damping. These tests indicated that the articu- 
lation decreased from 96 per cent to 90 per cent, as the damp- 
ing of the system was changed from infinity, that is, corre- 
sponding to the high quality circuit, to a damping of 130 bels 
per second. The system whose characteristic is shown in Fig. 
143 by the dotted line had an articulation of 92 per cent. It 
is rather surprising to find that such large departures from 
uniform response can exist in a transmission system without 
producing more serious impairment to the speech which is 
transmitted over it. As the damping of the transmission sys- 
tem becomes smaller than 130 bels per second, the articulation 



EFFECT OF FREQUENCY DISTORTION 


289 


obtained decreases at a more rapid rate, being about 80 per 
cent when the damping is reduced to about 35 bels per second. 

Tests which were made with a series of transmission sys- 
tems in which the damping was kept approximately constant, 
while the resonant frequency varied, indicated that when the 
resonant frequency varied only between 900 cycles per second 
and 2000 cycles per second there was only a small change in 
the articulation which was obtained with the system. When 
the resonant frequency was outside of this range, the articula- 
tion decreased. It should be emphasized here that for pro- 
ducing systems of highest quality such resonances within the 
voice range should be entirely avoided. 

Articulation tests were made with these resonant systems 
throughout a wide range of intensities. The articulation 
varied with intensity in a manner very similar to that shown 
in Fig. 127 for a high-quality system. It was found that at 
the high intensities the resonant system produced a greater 
relative impairment than that produced by the high-quality 
system. This is due to the fact that the frequencies near 
the resonant frequency are so loud as to become very annoy- 
ing before the other frequencies are sufficiently loud to indi- 
cate the characteristics of the speech sounds. This is particu- 
larly noticeable when a set designed for aiding the deafened 
has pronounced resonant peaks. A user complains of having 
his ear “banged” by certain vowel sounds before the loudness 
is sufficient to hear the consonants. 



CHAPTER VI 


Effect of Other Types of Distortion upon the 
Recognition of Speech Sounds 

Speech waves are distorted in a great many different ways 
besides those mentioned in the last two chapters. One way 
which is frequently encountered in amateur radio receiving sets 
is to “overload” the vacuum tubes. When the input speech 
energy is greater than that which the set is designed to handle, 
the output speech waves are distorted. Measurements upon 
telephone systems containing vacuum tube amplifiers show 
that considerable distortion of this type can be tolerated before 
any appreciable loss in articulation is produced. The results of 
one such series of measurements are shown by the curves in 
Fig, 1 44. The lower curves show the relation between the 
input and output levels for the speech as judged by listening 
tests. As seen from these curves, at an input level of about 
— 20 decibels the output vacuum tube reaches its capacity. 
For higher levels the output speech energy is no longer pro- 
portional to the input. It is seen, however, that the articu- 
lation decreases only from 79 to 77 as the input level increases 
15 decibels above its overload point as indicated by the curve. 
For higher levels, the articulation drops off rapidly. One 
can readily notice the effect of overloading by the presence 
of a peculiar high hissing sound even before the articulation is 
noticeably affected. These tests were made with a telephone 
system containing resonant elements which accounts for the 
articulation being below 80 per cent in the range of levels 
where no overloading takes place. Under such conditions 
there are two opposing factors operating, one tending to 
increase and one tending to decrease the articulation. On 

290 



EFFECT OF OTHER TYPES OF DISTORTION 


291 


account of the nonlinearity the weak consonant sounds are 
made stronger in comparison with the vowel sounds which 
would tend to increase the articulation. On the other hand 
the introduction of component frequencies not in original 
sounds tends to decrease the articulation. It is probably due 
to these opposing factors that the articulation changes so little 
when large distortions of this type are produced. Under cer- 
tain conditions the articulation of a circuit will very definitely 
increase when a non-linear element is introduced itno the 
system. This condition is obtained when the first factor pre- 
dominates over the second one. A few circuits fulfilling this 
condition have been tested in the laboratory. 

The effect of overloading upon the transmission of music 
is very much more marked than upon the transmission of 
speech. Tests made with music indicated that when the input 
was increased more than 5 decibels above the overload point, 
that is, for input levels higher than — 15 on the scale shown 
in Fig. 144, the reproduced music was noticeably affected. 
Tests were made for both vocal and instrumental music by a 
number of observers listening to the quality of the reproduced 
music. This type of distortion is very common in radio 
receiving sets, being due to either poor operation or poor design 
of the set. 

Another type of distortion which is interesting is that pro- 
duced by a phonograph when the speed of the turntable during 
reproduction is different from that used when recording. The 
component frequencies in such reproduced sounds can be 
obtained from those of the recorded sounds by multiplying 
each by a common factor. For example, if the speed is twice 
that of the normal, then the frequency of each component in 
the reproduced sound is doubled. There are two effects of 
such distortion which are readily noticed; namely, the pitch 
of the speech sounds is raised and the syllables are spoken very 
much more rapidly. The effect upon the articulation is not 
so obvious, but since the characteristic frequencies of the 
various speech sounds are shifted, it becomes more difficult to 
properly understand them. 


292 


SPEECH AND HEARING 


To investigate this effect, the standard articulation lists 
were recorded on disc records. This was done in cooperation 
with The Victor Talking Machine Company. By means of an 
electromagnetic reproducer the sounds were converted into 
electrical energy and sent into the high quality telephone 
system described above, and the syllables were then observed 
in the usual manner. Speeds of rotation from about I to il 
the normal were tried. The results of these tests are shown in 
Fig. 145. Changes of speed less than 10 per cent produce very 
little effect. For greater changes the articulation foils off 
rapidly. Decreasing the speed has a greater effect than 



INPUT LEVEL 

Fig. 144. — Effect of Overloading upon the Articulation of Speech. 


increasing the speed. In all the types of distortion di.scusscd 
above the harmonic relationship between the fundamental and 
overtones is maintained. 

In another type of distortion which is peculiar to carrier 
telephone systems this relationship is not maintained. When 
the frequency of the carrier introduced at the receiving end 




EFFECT OF OTHER TYPES OF DISTORTION CI95 


differs from the frequency of the carrier at the sending end, 
the frequencies in the speech spectrum are shifted by a definite 
amount. Each component frequency is either increased or 
decreased by the same number of cycles. For example, if this 
shift is 50 cycles, a vowel pronounced at a pitch corresponding 



Fig, 145. — Effect upon Articulation of Multiplying Component Frequencies 

BY A Common Factor. 


to TOO cycles would be reproduced by such a system with the 
components at 150, 250, 350, 450, etc. As will be seen, the 
harmonic relationship is destroyed. 

Articulation tests with such a system gave the results 
which are shown in Fig. 146. It is seen that the shifts greater 
than TO cycles produce a noticeable effect, but it is remarkable 
that shifts as high as 300 or 400 cycles are possible without 
completely destroying the intelligibility. As will be seen 
from the curves of Figs. 125 and 146, an average person would 
interpret short sentences correctly over 90 per cent of the time, 
even when all the component frequencies are shifted upward 
400 cycles. The data indicate that shifting the speech fre- 
quency downward produces a more serious deterioration than 
by producing the same shift upward. This type of distortion 
has a much more serious effect on music than on speech. Since 


!i94 


SPEECH AND HEARING 


musical tones are rich in harmonics, it is evident that a shift 
which changes the harmonic relationship will produce a serious 
quality damage to musical tones. 

Another common type of distortion is that produced in 
rooms which are reverberant. In such rooms there are two 
causes which are operating to decrease the articulation of the 



Fig. 146. — Effect of Frequency Shift upon the Articulation of Speech. 


received speech. The first one is due to the persistence of the 
sound after the source has been silenced. This phenomenon is 
frequently referred to as the ‘"hangover’’ effect. It not only 
distorts the speech sounds by making the endings and begin- 
nings different, but it tends to mask the succeeding speech 
sounds. Particularly, the vowel sounds tend to mask the suc- 
ceeding consonants. The second effect is similar to the fre- 
quency distortions discussed in Chapter V. Due to the 
selective absorption properties of the room some frequencies 
are reproduced much more efficiently than others. Both of 
these effects are dependent upon the position of the observers 
in the room. 



EFFECT OF OTHER TYPES OF DISTORTION 295 


It is a common practice to describe the acoustic properties 
of a room in terms of its reverberation time. The reverbera- 
tion time is the number of seconds after the source is shut off 
before the sound has decreased its intensity 6 bels. For 
example, a room having a reverberation time of one second 
has a damping of 6 bels per second. In the past, very little 
attention has been given to the source of sound in making 
reverberation time measurements. Frequently the reverbera- 
tion time is given without stating whether it is for speech, 
pure tones or for some other sound. This condition has 
probably arisen because of the difficulty of making such 
measurements. The observational error of the common rever- 
beration test is so large that it is hard to distinguish between 
different classes of sounds. To more completely describe the 
acoustic properties of the room, a curve should be given which 



Fig. 147. — Articulation vs. Reverberation Time. 


shows the reverberation time or damping for each frequency. 
Very frequently an auditorium is poor because of the lack of 
damping for the low frequencies only. Acoustic treatment 
which produces an absorption for the high frequencies in such 
a room would make it worse rather than improve it, 




296 


SPEECH AND HEARING 


Knudsen has made some articulation measurements in 
rooms of various reverberation time, but gives only one value 
for this time for each room; presumably that which corre- 
sponds to speech. One set of tests was carried out In a room 
whose reverberation time was changed by bringing Into it 
various amounts of hair felt. The blank dots shown in Fig. 
147 give the results of these tests. Knudsen also made tests 
in auditoriums having different acoustic properties both before 
and after acoustic treatment. These results are shown by the 
solid dots in Fig. 147. The tests were made with the standard 
articulation lists by using three observers who were stationed 
at different parts of the room. These results indicate that it is 
a good approximation to say that the articulation decreases 
about 7 per cent for each additional second in reverberation 
time. However, the amount of articulation reduction will 
undoubtedly depend upon whether the high frequencies or the 
low frequencies are damped out most quickly. There is no 
one-to-one relationship between articulation and reverberation 
time but the above results indicate the general effect in large 
rooms. 



CHAPTER VII 


Effect of Noise and Deafness upon the Recognition 
OF Speech Sounds 

The effect of noise upon the ability to hear is very sinnilar 
to the effect of partial deafness. As stated in the chapter on 
“Noise,” one of the best ways of describing a noise is to give 
its deafening effect. For this reason the effect of noise and the 
effect of deafness upon the ability to recognize speech are con- 
sidered together in this chapter. 

When a noise is present at the ear, the threshold for hearing 
other sounds is shifted. The noise audiogram gives the amount 
of this shift. It is the deafening effect. The effect of the 
noise upon the ability to recognize sounds has four aspects. 
First, the average threshold shift produces an effect equivalent 
to the lowering of the intensity of the speech sounds. Second, 
the unequal threshold shift for different frequency ranges pro- 
duces a distortion effect equivalent to that produced by a 
transmission system having unequal responses for different 
frequency ranges. Third, for the higher intensities there is an 
intermodulation between the noise and speech which takes 
place during their transmission through the middle ear. 
Fourth, the presence of the noise tends to distract the attention 
from the perception of the sounds. 

Although the presence of a noise may be annoying, if one 
concentrates his attention on the speech sounds he can identify 
them when they have sufficient intensity to be clearly above 
the noise. This is true only when the intensity of the speech 
is not greater than 70 or 80 decibels above the threshold. 
When the noise is so great that intensities larger than this 
must be used, the ability to recognize them will always be 

297 


SPEECH AND HEARING 


298 

somewhat less than in a quiet place due to the intermodulation 
effect. 

The curve representing the relation between intensity and 
articulation of speech sounds received in the presence of a 
noise whose audiogram is flat, can be obtained from the curve 
of Fig. lay by shifting those points on the curve below a sensa- 
tion level of 70 decibels an amount equal to that shown by the 
noise audiogram. A system of curves obtained in this way is 
shown in Fig. 148. The abscissas give the sensation level and 
the intensity level of the speech sounds and the ordinates the 

SENSATJON LEVEL. 



Fig. 148. — ^Articulation vs. Intensity of Received Speech in the Presence of 

Noise. 


articulation which is obtained under the different noise condi- 
tions. The number on each curve gives the threshold shift 
produced by the noise as indicated by a noise audiogram. 
Experiments have shown that at the higher intensities no 
values of articulation will be obtained which are greater than 
those obtained in a quiet place. Consequently, the curves at 
these intensities have been estimated. 

It is difficult to produce a noise which has a perfectly flat 
audiogram. In Figs. 149 and 1 50 are shown some articulation 
curves taken in the presence of noise. The iioise audiogram 






300 


SPEECH AND HEARING 


cies^ thus producing a noise audiogram whose threshold has 
shifts mainly in this region, the articulation curves cannot be 
obtained by such a simple procedure as given above. For 
example, in Fig. 151 are shown the curves representing the 
articulation obtained in the presence of pure tones. The 
noise for curve No. 7 is a 2000-cycle pure tone at a sensation 
level of 78 and curve No. 8 is for a 1000-cycle tone having the 
same sensation level. It is seen that the curves No. 7 and 
No. 8 cannot be obtained from the ^‘no-noise*’ curve by 


SENSATION LEVEL 

10 20 30 40 50 60 70 80 90 100 110 120 

jooi — f — ; — I — I — I — I — \ — t — I — I — 1 — r’x—T — \ — rn — i — i — i — \ — i — \ — i 

















































1 























2 














z 








iirl 















/ 


"n 














































/ 









r 















£ 








2 















i 
























j 




"7l 

4 ^ 




1 


















1 

I] 

-.y. 



— 















— J 


j. 



j 

TT 




A 



_j 





1 






1 


7 


1 

H 






^Y-X-aOOO'^ TONE AT 
SENSATION LEVEL 78 
H^S-D-IOOO'^ TONE AT 
SENSATION LEVEL 78 

L-. 

■ 

■ 

■ 

m 

■ 


m 



■ 


■ 

■ 

■ 

n 

■ 

■ 

01 

■ 





■ 


r 

n 

■ 

B 

■ 




■ 



■ 

m 

■ 

1 

i 

■ 

■ 

R 


■ 



:■ 

rj 

■ 


i 

■ 

R 

m 


■ 















IR 

■ 

m 

■ 

■ 

n 

m 

■ 


■ 


■ 

■ 

■ 

■ 


■ 

■; 

■ 

■ 

■ 

■ 

■ 


m 


■ 

■ 

S' 

m 

■ 



■1 


■ 

■ 

■ 

■ 

■i 

■ 

m 

■ 

■ 

■ 

■ 

■ 



100 90 00 70 60 50 40 30 20 10 0 

INTENSITY LEVEL 


•Fig, 151. — Effect of Noise upon Articulation. 


making horizontal shifts as was the case for the first six curves 
given. 

Experiments were made using pure tones as the interfering 
sounds. When the frequency is below 500 cycles the resulting 
intensity level-articulation curves are of the first class men- 
tioned above; that is, the principal effect upon the curve is to 
produce a horizontal shift. When the frequency is above 500 
cycles, the shift is always considerably more for the high than 



EFFECT OF OTHER TYPES OF DISTORTION 


301 


for the low intensity levels as illustrated in Fig. 151. As a 
general rule since the low-pitch tones produce a greater mask- 
ing for the frequencies which are important for recognizing 
speech, especially when they are very intense, they also cause 
a greater reduction in articulation than the high tones of 
equal sensation level. Also noises of any complex character 
produce, in general, a greater reduction in articulation than 
that produced by the pure tones of the same sensation level 
since the threshold shift which they produce covers a much 
wider frequency range. 

As an illustration of how these data might be used to 
answer practical questions, suppose it is desired to find the 
interference to conversation in the presence of typewriter 
noise corresponding to the audiogram of Fig. 60. Throughout 
the important speech range the threshold shift is about 45 
decibels. In ordinary conversation in a room the sensation 
level is about 70 decibels. According to the curves of Fig. 149 
the articulation corresponding to this condition is 60 per cent. 
Under such articulation conditions, the persons conversing 
would probably raise their voices about 10 decibels in intensity 
level, thus increasing the articulation to 80 per cent. In a 
similar way, one can find the average interpretation under any 
specified noise condition whose corresponding audiogram can 
be considered as approximately flat. 

What has been said about noise can also be said about 
deafness. When the audiogram representing the degree of 
hearing cun be represented by an approximately flat curve, 
the main effect upon the intensity articulation curve is to 
produce a horizontal shift. The articulation to be expected 
can be found in the manner described in the last paragraph. 
For example, if the noise audiogram mentioned above repre- 
sented the degree of deafness for a person rather than a noise 
condition, then that person would obtain an articulation of 
approximately 60 per cent when people were conversing with 
normal intensity and placed about three feet apart. The inten- 
sity level of speech from the speaker in a large auditorium such 
as a theatre or a church is usually between 50 and 60 decibels. 



302 


SPEECH AND HEARING 


Under such circumstances the deafened person mentioned 
above would have serious difficulty understanding anything 
at all. It must be remembered that these figures apply to the 
average person and that there will be considerable variations 
from this average, depending upon the intellect of the deafened 
person. With the speech levels emerging from a telephone 
receiver in practise, unless a person’s hearing is reduced more 
than 30 decibels below normal, he should have little difficulty 
in understanding most conversations over the telephone. It is 
necessary to maintain this high speech level over the telephone 
because of the room noise which is present at most subscribers’ 
stations and also on account of noises which are picked up 
by the line. 

For certain types of deafness there is added to this shift 
effect a distortion effect. For such cases the articulation is 
always lower than that calculated by the above method. 
Persons having nerve deafness, which is indicated by the 
audiogram dropping rapidly at the high frequencies, have a 
greater difficulty in trying to understand speech than those 
whose audiograms are approximately flat. Some persons 
having this type of deafness are unable to interpret speech 
regardless of the level at which it is introduced into the ear. 

In this connection it is interesting to note that those sounds 
which are most difficult to hear and interpret such as f, th, s, 
etc., are among the easiest sounds to interpret by noting the 
position of the lips. On the other hand the vowel sounds which 
are easier to hear are difficult to interpret from the lip positions. 
For this reason, hearing and lip-reading materially aid each 
other under those conditions where it is difficult to make the 
proper interpretations. Deafened persons realize this and can 
recognize very much better those speech sounds which they 
hear when at the same time they are able to clearly see the lip 
positions. 



APPENDICES 







APPENDIX A 


A DRAWING showing the arrangement for calibrating a con- 
denser transmitter by means of a thermophone is given in 
Fig. I. Flydrogen gas is sent into one of the capillary tubes 
shown and out of the other until the enclosed chamber is filled 
with this gas. Capillary tubes are used for this purpose so 
that from an acoustic standpoint the chamber may be con- 
sidered completely closed. The measurements are made with 
hydrogen gas so that the wavelength will be as large as possible 
compared to the size of the chamber. This is necessary, for in 



ELECTROOe 

Appendix A. Fig. i. —Arrangement for Calibrating Electrostatic Trans^ 

MirrER. 

the development of the formula given below it was assumed 
that variations of pressure at different positions within the 
chamber are all in phase. When the wavelength is comparable 
to the size of the chamber, standing wave patterns are set up. 
Under such conditions this formula does not hold. 

A direct current Jo is sent through the gold foil strip, 
heating it to a temperature du. The final temperature which 

30s 



3o6 


SPEECH AND HEARING 


it assumes depends upon 7o and also upon the heat capacity 
of the foilj the properties of the gas^ and the size of the chamber. 
It can be calculated from the change in resistance of the foil. 
An alternating current /i cos o;/ is then superimposed upon the 
direct current. It causes fluctuations in the temperature of the^ 
gold strip and also in the gas immediately surrounding it. This 
in turn causes fluctuations in pressure which depend upon the 
rate at which the heat is conducted and radiated away from the 
gas layer next to the strip. The following formula was deduced 
by E. C. Wente of Bell Telephone Laboratories^ and its validity 
has been demonstrated experimentally. The pressure varia- 
tion p is given by 


p = 


. 478 i?/o/i cos (o)/ — ^)M 


GQ - UF 


Aaoo\ 


a 


/ 


"+ {fq fug- — )' 


R = resistance of thermophone strip; 

70 == direct current; 

71 = amplitude of alternating current; 

0 ) = 27r times the frequency of variation; 

$ = phase between pressure and current; 

M = correction factor which is nearly unity and is given by 

^ ^2 \H/ o n VA 




7^0 a {pc?, 


cc 


1 (aV)- 


S = area of the walls of the chamber: 

Vq ~ volume of the chamber; 
a is related to the heat diffusibility and is given by 


a 



po = density of the gas; 

Cp — specific heat of the gas at constant pressure; 

K = the heat conductivity of the gas; 
a' is the same as a except that the constants involved 
refer to the thermophone strip. 


V jaTsf k — I 00 


ia 



APPENDIX A 


307 


Q — laoiK. -j- %ciCdQ^ 
U — a.yo3 + laoiK 




FoaTa/ 


napo 


\ 


I — 


I 00 


Ta 


k = ratio of the specific heats of the gas at constant volume 


and at constant pressure = 


Cv 


A = the reciprocal of the mechanical equivalent of heat; 
a = area of one side of the thermophone strip; 

00 = temperature of the thermophone strip due to /o; 

7 = heat capacity per unit area of the strip; 

Ta == average temperature of gas within the enclosure; 
pQ = average pressure within the enclosure; 

C = Plane’s radiation constant. 

AciCxi 

FQ^GU - — 

/7 

The phase 0 is determined by tan 0 


GQ - FU 


Aat 


For frequencies above 50 cycles the effect of radiation can 
be neglected and the above formula is simplified thus: 

^ + Dy^ 

where p and I are either maximum or effective values and 


N 


yooF :)a f 


2po \ 


(Ta - 


^00 


B ^ 1 + 


^Ka 

yco 


D 


2 a 


"00 


Foa k 

la 


-do 


Neglecting the radiation effect causes a a per cent error at 
32 cycles. The correction factor M varies from .7 at 32 cycles 
to .98 at icoo cycles. 



APPENDIX B 


Derivations of Equations Giving Relations of Pressure, 
Density, Velocity, Displacement, and Power in a Plane 

Wave ‘ 

The equilibrium position of layer (i) is at x and layer (2) 
a.t X dx as shown in Fig. i. At a time t these layers are at 

X + y and a? + y + ^dx. 

dx 



Appendix B. Figure i. 


Since the mass included between them is the same in both 
instances 

p^dx = (po + p ) dx '^ = (po + p)( I +4^ 

\ dx 

or neglecting p compared to po 

dy p 

The forces acting on the layer of air at a time t arc 

P + p — + 

308 




APPENDIX B 


3°9 


This is equal to the mass times the acceleration of the layer or 


dp 

dx 


'dx p^dx' 


d^y 

dfi 


(3) 


From the adiabatic law of gases 


P p 
P 


po + p y 

■ PO 1 


yP 

or, p = — p. 
Po 


Substituting from (a) and (4) 
ddy _ J^y 
dP ^ dx^ 

where 



(4) 


( 5 ) 

( 6 ) 


Equation (5) is the desired equation of motion. A solution of 
this equation is 

y = A cos 


, Cl) 

(lit H — X 
a 


(7) 


where A and w are arbitrary constants. If equation (7) is 
written in the familiar form of a wave equation, namely 




A cos 27 r 



( 8 ) 


it is seen that q is the velocity of propagation of the wave since 

it is equal to the frequency ^ times the wave length X. Also 

A is the amplitude of vibration of each air particle as the wave 
passes. A summation of any number of terms of the above 
form will also satisfy the equation. These equations enable 
one to write down the relations between the amplitude and 
velocity of the vibrating particles of air, the excess density, and 
the excess pressure caused during the passage of the wave. 
If the same letters are used to represent r.m.s. values instead 
of instantaneous values, then the amplitude y of the vibrating 
particles is given by 

y = iVLf. 


(9) 



310 


SPEECH AND HEARING 


The velocity o of an air particle is 

” = (f ).„= 

with a phase of 90*^ compared to the displacement, 
density p is by equations (2) and (7) 

coy V 
p = po — = PO" 


(10) 


The excess 


(II) 


and is opposite in phase to the velocity v. The excess pressure 
p is given by equation (4) or 

p = a^p = po^^y == poczcojy (12) 

with a phase of 180° from the velocity v or 90° behind y. The 
quantity {pod) is called the radiation resistance and may be 
designated by r, then 

p = rv, (13) 

This equation is similar in form to the equation expressing 
ohms law for electrical circuits; p corresponding to potential 
difference, r to electrical resistance, and v to electrical current. 
By analogy, then, it might be inferred that the intensity J of 
the sound or power flowing through a square centimeter would 
be given by 

J = pv = rv-. (14) 


That these equations do hold is easily shown as follows. The 
kinetic energy in a tube of the sound waves which is a centi- 
meters long, is given by 


K.E. 






la + I { cos 2 
L Ji) 


0) 

a 


*dx 


. (15) 


The last term is fluctuating with time but is always negligibly 
small compared to the first term. Therefore, 

K.E. = == (16) 



APPENDIX B 


3 ” 


Since there is a constant interchange between potential and 
kinetic energy, they must be on the average equal. The total 
energy in a tube of unit cross-section which is a centimeters 
long is then 

J = rv^ 


which is the intensity of the sound, since this energy moves 
along with the wave and passes through a unit cross-section 

every second. The fractional change in pressure - is related 

to the fractional change in density by the formula 



as will be seen from equations (12) and (6). 

For example, consider these relationships for the average 
speech intensity. For air, the radiation resistance is 41.5 at 
20° C. The average speech intensity at 10 centimeters from 
the mouth is approximately .01 microwatt or .1 erg per second. 
ConsQquently, the excess pressure ^ is 2 bars as compared to 
1,000,000 bars for the undisturbed state. The velocity v is 
.05 cm/sec. The displacement y, if it is assumed that most 
of the energy is at 100 cycles, is toVs- millimeter. At other 
frequencies, it is inversely proportional to the frequency. The 
excess density p is 2 X 10 grams/cc. 



APPENDIX C 


Let the pressure variation of the air in front of the drum 
of the ear be designated by p. Since the pressure of the air 
in the middle ear balances the undisturbed outside air pressure 
this change in pressure multiplied by the effective area of the 
ear drum is the only effective force that produces displace- 
ments. Let the displacement of the fluid of the cochlea near 
the oval window be designated by X If Hooke’s law held for 
all the elastic members taking part in the transmission of sound 
to the inner ear, then 

X = kp (i) 

where ^ is a constant. 

It would be expected from the anatomy of the ear that 
Hooke’s law would start to break down even for small dis- 
placements. So in general the relation between the force p 
and the displacement X can be represented by 

X = f{p) = ao + aip + aoipy^ + auip)'-^ + ■ ■ . (2) 

where the coefficients ao, ai, az . . . belong to the expansion of 
the function into a power series. Now if is a sinusoidal 
variation, then 

p = po cos w(, (3) 

the frequency of vibration. Substituting this 

value in (2), terms containing the cosine raised to integral 
powers are^ obtained. These can be expanded into multiple 
angle functions. For example, for the first four powers 

COS^ ^ cos 2 Co/ -j- '2 , 

cos'* co/ = I cos 3 ut + -J cos CO/. 

COS'* CO/ = -J- cos 4C0/ + I cos 2co/ + f. 

312 


(4) 

( 5 ) 

( 6 ) 



APPENDIX C 


313 


It is evident then that the displacement X will be represented 
by a formula 

X — Ar bi cos 0)/ + b^ cos Q.oit + ^3 cos 30?/ + . . . 

In other words when a periodic force of only one frequency is 
impressed upon the ear drum^ this same frequency and in 
addition all its harmonic frequencies are impressed upon the 
fluid of the inner ear. 

If two pure tones are impressed upon the ear then^ is given 
by 

P — pi cos (lilt + p 2 COS (Ilzt. 

If this value is sutstituted in equation (2), terms of the form 
cos" oiit and cos’" co-jt and cos" uit cos’" U 2 t are obtained. The 
first two forms give rise to all the harmonics and the third 
form gives rise to the summation and difference tones. For 
example, the first four terms are 

(30 = flO* 

aip = ai(pi cos cjit + p2 cos mi). 

a2{pY ~ cos 2 (Jilt + '1^2^ cos 2 (Jl2t 

-fi y)iy>2(cos {(ill — Cii2)t -f- cos (wi -f- C02)/) 

+ K/’ff + 

^•A{py^ = + \p\p’f) cos (Jilt + \p^ cos 3 (lilt 

+ {\pp + %pl^p‘l) cos (Jl 2 t + cos 3 (Il 2 t 
+ \p\-p'l cos {(Il 2 t + 2 (lilt) + \pi^p 2 cos (c02/ — 2 (lilt) 
4- \p\pp CO.S (wi/ 4- 2 (Il2t) 

+ \p\p‘P cos {(lilt — 2 (il2t)]. 

Therefore, unless there is a linear relation between a force 
acting on the ear drum and the displacement at the oval 
window, that is, unless all the coefficients in equation (2) are 
zero except rti, the harmonics and the summation and difference 
tones will be impressed upon the fluid in the cochlea of the 
inner car. 



APPENDIX D 


The relation between hearing loss in sensation units and 
the maximum distance for hearing and interpreting speech is 
established as follows. It is well known that the intensity of 
sound decreases as the inverse square of the distance from the 
source. This is true only where there are no solid objects in 
the vicinity which cause reflections. Stated mathematically^ 


/o 


(I) 


where I is the intensity at a distance of d and Jo the intensity 
at a distance of do from the sound source. From the definition 
of hearing loss given 


H.L. 


lo 



(^) 


Combining equations (i) and (2) 


H.L. 


20 log 


d^ 


(3) 


It has been found that the speech sounds entering a normal 
ear from a caller whose lips are i| inches or foot from the 
opening of the ear canal may be attenuated 80 decibels before 
50 per cent of the called numbers are recognized incorrectly. 
This was determined experimentally by means of the high- 
quality telephone system described in Part Four, Chapter IV. 
The dial of the attenuator was first set so that the speech 
emerging from the telephone receiver had an intensity equal to 
that produced when talking directly into the ear at I foot 
distance. The attenuator was then turned until the person 
of normal hearing could only recognize half of the numbers 
called. The diflFerence in the settings was found to be equal 
to 80 decibels. In a similar way it was found that the corre- 



APPENDIX D 


315 


spending differences when an average whisper, or pp voice, 
and a voice was used were approximately 50 decibels, 65 
decibels, and 95 decibels, respectively. 

The normal distances for interpreting numbers can be 
obtained from these figures. From equation (3) it follows that 

^0 = 39-5 feet for the average whisper; 
do = 222 feet for the voice; 
do = 1250 feet for the w/ voice, and 
do = 7040 feet for the jff voice. 


Using these values for do and equation (3) the correspond- 
ing values of H.L. (hearing loss) and distance d given in 
Table XXII, were calculated. 

Let the maximum distances for hearing the average whisper, 
pp voice, mf voice, and ff voice be d', d”, d'", and d"", for the 
patient, and do', do", do", and do"", for a person of normal 
hearing. Then from equation (3), 


, do' 

20 log 


or 


20 log 

d' 


d" 


20 log 


d'" 


. 1 ^0"" 

20 log -jm 


do" 

d" 


do'" do' 


(4) 


This equation states that regardless of the intensity of 
voice used, the ratio of the hearing distance for the normal 
ear to that for the ear of the patient is the same. Let x' , x" , 
x'" , and x"" be the increased hearing distances corresponding 
to a hearing improvement q for the four types of voices. Then 


, = - “log (i + 1) 

Similarly 

q = 20 log (i + 5r/) = 20 log (i + = 20 log (i + 


or. 


x' x" 


x'" x"" 


d' d" d'‘ 


d'' 


( 5 ) 



31 6 SPEECH AND HEARING 



This leads to the very important conclusion that if a patient 
shows an improvement of i foot when using the average 
whisper he will also show an improvement of 5.6 feet, 31.7 feet, 
and 178 feet for the other three intensities of calling. This is 
true regardless of the amount of the improvement; if it is 
small, the additional distance will be added to a large distance; 
if large, to a small distance. 



APPENDIX E 


If a condenser is connected across the two wires of a long 
transmission line which has a characteristic impedance of 
600 ohmSj what will be the loudness loss of the reproduced 
speech? It is easily seen that the ratio r of the current flowing 
into the second impedance before and after the condenser is 
strapped across the line is 



Appendix E. Figure i. 


r = V" I + {.ooii-n-fY 

where / is the frequency. The value of y then becomes 
y = [1 + (.ooia7r/)2J-^. 

When these values of y were plotted against values of 

X = j G(f)df wKich. are obtained from Fig. 119, the resulting 
do , _ 

areas were found to be .49 and .39 for the high quality and 
resonant circuits, respectively. The corresponding values in 
decibels are 9.3 and 12.3, which are the effective losses required. 


317 



APPENDIX F 


In singly-resonant systems the velocity (v) (current for 
electrical systems) produced by a force (electromotive force 
for electrical systems) is known to be given by 

„ = ^ (i) 

R -t-yf OTw — — ) 


where R is the resistance, m the mass (inductance for electrical 
systems), and S the elastic constant (the reciprocal of the 
capacity for electrical systems). The resonant frequency in 

kilocycles /o = is given by the condition that the last 

term of the denominator is zero or 


/o = 


aoooTT 


'Iz- 


(2) 


The damping constant A in beis per second is known to be 
related to m and R by the equation 


A = 



(3) 


Let a be the number of bels down from the velocity amplitude 
corresponding to the resonant frequency; then obviously a is 
given by 


a = a log 


£0 

V 


(4) 


where the parallel vertical bars indicate absolute values must 
be taken. Then substituting and reducing 


a = log 


I + 7.45 X 10' 

318 


m. 


A 

■/' 


is) 



APPENDIX F 


319 


Or if the value of the pitch P in octaves be introduced from the 
relation P = logs f, this reduces to 

a = log I + 7.45 X (6) 

which shows that a has the same value for those tones which 
have the same difference in pitch above or below the resonant 
pitch. In other words^ curves which represent P and a are 
symmetrical about Po- 




INDEX OF NAMES 


Abraham, 134 
Adrian, E. D. 126 
Arnold, H. D., 134, 196 

Beck, C. J., 70 
Bevier, L., 20 
Boltzmann, F. L., 132 
Boring, E. G,, 118 
Bowlker, 193 
Bunch, C. C., 215 

Crandall, I. B., 26,^54, 77, 134 

Dewey, G., 81 
Dean, L. W., 215 

Fowler, E. P., 184, 199 
Free, E. E., 106 
Fry, T. C., 191 

Graham, F. H., 181 
Gray, A. A., 118, 124 

Harrison, H. C., 23 
Hart, V. W., 127 
Hartley, R. V. L., 191 
Helmholtz, H. von, 14, 19, 46, 

116, 118 

Herman, L., 20, 47 
Hewlett, C. W., 139 

Johnson, K. S., 279 
Jones, E H., 118, 215 

Keith, 117 

Kingsbury, B. A., 227 
Knudsen, V. O., n8, 145, 215, 296 
Koenig, R., 16 


Koenig, W., 51 
Kranz, F. W., 134, 215 

Lane, C. E., 96, 139, 166, 176, 183, 195 

Macfarlan, D.,. 209 
Mackenty, J. E., 13 
Mackenzie, D., 77, 227 
Marx, 123 
Maxfield, j. P., 23 
Mayer, A. A., 167, 180 
Merritt, E., 17 
Miller, D. C,, 17, 58 
Moore, C. R., 90 

Nichols, E. 17 

Paget, Sir Richard, 3, ii, 58 
Porter, E. L., 127 
Rayleigh, Lord, 133 
Riesz, R. R., 150 
Ripman, W,, 7 
Rousselot, P., 16 

Sabine, W- C., 15, 227 
Sacia, C. F., 26, 65, 70, 75 
Scott, L., 16 

Scripture, E. W., 21, 47, 125 
Seashore, C. E., 215 
Shame augh, G. E., 116. 

Sivian, L. j., 68 
Stewart, G. W., 189 
Stewart, J. Q., 10, 269 
Stumpf, C., 58 

V 

TER KuILE, E., 1 16 
Toepler, a., 132 



322 


INDEX OF NAMES 


Wead, 133 

Webster, A. G., 133 

Wegel, R. L., 90, 139, 166, 176, 183, 199 

Weiss, O., 15 

Wente, E. C., 27, 134, 306 
Wheatstone, C., 47 
Wien, 133 


Wilkinson, G., xi8, 124 
Willis, 47 

WiTTMAACH, 123 

Wrightson, Sir Thomas, 117, 129 
Yoshii, 123 



SUBJECT INDEX 


A 

Acoumeter 

relation of measurements with, to sen- 
sation units, 2 o6~2o 8 
test for hearing acuity, 206-208 
Acoustic spectra 
effect of pitch on, 51, 54, 55 
of musical sounds, 90-95 
of vowel sounds, 50-55 
of watch tick, 208 
Acuity 

distances for hearing average speech, 
187 

in presence of noise, 185-187 
methods of testing, 198-221 
of normal ear, 132-142 
relation between various tests for, 202- 
221 
Alphabet 

International Phonetic Association, 5 
National Education Association, 5 
Simplified Spelling Board, 5 
Standard (Bell Telephone Labora- 
toricvs), 5, 6 

American Federation of Organizations for 
the Hard of Hearing, 212 
Analyzer, Electrical Harmonic, 51 - 55 ? 88- 
90 

Anatomy 

of hearing, see Organs of hearing 
of speech, see Organs of speech 
Anvil (ossicle), 112, 113, 118, lai 
Articulation 
consonant, 255 

effect of deafness on, 297-302 
effect of frequency distortion on, 279- 
289 

effect of intensity on, 270-278 
effect of noise on, 297-301 


Articulation 

effect of resonance distortion on, 287- 

289, 318, 319 ^ 

effect of reverberation on, 294-296 
effect of speed of rotation of phono- 
graph on, 291-293 

effect of vacuum-tube overload on, 290- 
292 

letter, 255 

relation between syllable and letter, 
266, 268 

relation to intelligibility, 266 
relation to probability, 268, 269 
sound, 255 

standard test for, 256-258 
syllable, 255 
vowel, 255 

Artificial larynx, I2, 13 
Artificial production of speech soundr 
acoustic, II, 12 
electrical, 10, ii 
Audibility 

methods of measuring, 13 2-140 
threshold of, 132-142 
threshold of feeling, 13 2-1 43 
Audiometer 

buzzer, see Buzzer audiometer 
description of, 1 03-105 
phonograph, see Phonograph audiom- 
eter 

pitch-range, see Pitch-range audiom- 
eter, 104, 216-218 
use in measuring hearing, 211-221 
use in measuring noise, 103-105 
Audiogram 

noise, definition, 104 
typical deafness, 200, 217 
typical noise, 106, 186, 299 
Audition, see Hearing 


3^3 



3^4 


SUBJECT INDEX 


Auditory Masking, 167-187 
see Masking 

Auditory Meatus, ill, 113 
Auditory Nerve Action, 1 25-131 
Auditory Sensation Area 
definition, 144 
diagram, 141 

Average Speech Power, 64-71 
B 

Balance method of measuring noise, 105 
Basilar membrane, 114-117, 121, 122 
positions on, for sensing pitches, 164- 
166 

response to musical tones, 252-254 
vibration form, 180-185 
Beats 

binaural, 188-196 
objective, 194 
subjective, 194-196 
Bel, definition, 68, 69 
Binaural 

beats, 188-196 

location of complex sounds, 192-196 
loudness, comparison with monaural 
loudness, 237, 238 
masking, 172, 173 
Bone conduction, 174 

function in producing objective binau- 
ral beats, 194 

Buzzer audiometer (Nos. 3-A and 5-A) 
description, 103, 214, 215 
use in measuring hearing, 214, 215, 217- 
220 

use in measuring noise, 103 
C 

Canal, ear, iii, 112 

Capsule, manometric (Koenig), use in 
recording speech waves, 16, 17 
Cavities 
nose, 4 
throat, 4 

Characteristic frequencies 
of music, 87-98 
of speech, 51-63 
Characteristics of speech 
energy distribution, 76-8Q 


I Characteristics of speech 
frequency analyses, 51-63 
pitch, 51-55, 60, 62 
power, 64-75 

typical speech waves, 29-49 
Children, school, methods of measuring 
hearing of, 21 1-2 1 4 
Classification of speech sounds, 6 
Cochlea, 112-117 
Coin click 

relation to sensation units, 206-208 
test for hearing acuity, 206-208 
Components, frequency, sec Frequency 
Condenser transmitter 
calibration, 305-307 
description, 27 
Conduction 

bone, see Bone conduction 
nerve, 1 25-131 
Consonants 

articulation of, see Articulation 
duration of, 57-62 
fricatives, 9, 10 
power, 71, 74 
sensation level, 73 
stops, 9, 10 
unvoiced, 9, 10 
voiced, 9, 10 
Conversion scale 

hearing loss in sensation units to maxi- 
mum hearing distances for speech, 
202-206, 314-316 

hearing loss in sensation units to per- 
cent loss, 201 

hearing loss in sensation units to watch 
tick, acoumeter, tuning fork and 
coin click test results, 202-221 
sensation level to loudness, 229-232 
Cords, vocal 
action, 5 
description, 5 
diagram, 4 

D 

Deafness 

audiograms of typical cases, 200, 217 
effect on articulation, 297**302 
hearing-loss scale, 1 99-201 
methods of measuring, 1 98-22 j 



SUBJECT INDEX 


3^5 


Deafness 

relation between various tests for, 202- 
221 

Decibel, definition, 68, 69 
Density, relation to pressure, velocity, 
displacement and power in a plane 
wave, derivation of equations for, 
308-311 

Difference tones, 175-180 
effect on pitch, 250-254 
proLluctioti by ear overload, 312, 313 
Differential intensity sensitivity, 145-151, 

159 

Differential pitch sensitivity, 151, 152, 
158-164 

Diphthongs, formation, 8, 9 
Displacement, relation to density, pres- 
sure, velocity and power in a plane 
wave, derivation of equations for, 
308-311 
Distances 

for hearing average speech, 187 
relation between sensation units loss 
and maximum hearing distance for 
speech, 202-206, 219, 314-316 
Distortion 

effect of frequency distortion on articu- 
lation, 279-289 

effect of frequency distortion on pitch, 
245-254 

effect of resonance distortion on articu- 
lation, 287-289, 318, 319 
Drum, ear, 113, ti8 
Duration of speech sounds, 56-62 

E 

Ear 

canal, ni, 112 
characteristics, see Hearing 
drum, 113, 118 
functions, 118-125 
inner, tic.scription, 112-117 
midtile, description, in, 112 
outer, description, in, 112 
overload of, 312, 313 
Electrical harmonic analyzer 
tlcscription, B8-90 
speech records from, 51-55 


Electrostatic transmitter 
calibration, 305-307 
description, 27 

Energy, frequency distribution in speech, 
76-80 

Equal loudness, curves for, 230 
Eustachian tube, 112, 113 
Evolution of speech, 3, 4 

F 

Feeling, threshold of, 132-143 
Film, motion-picture, use in recording 
speech waves, 24-26 
Filters, electrical 
circuits of typical, 280 
use of, see Frequency distortion and 
Frequency components 
Formation of speech sounds, 5-10 
Frequency 

analyses of distribution in speech, 5 ^“^3 
analyses of line noise, loi 
component multiplication, effect on 
articulation, 291-293 
components, effect on pitch and quality 
of musical sounds, 245-254 
distortion, effect on loudness, 233-237 
distortion, effect on articulation, 279- 
289 

energy distribution in speech, 76-80 
level, 153-^57 
limits for hearing, 143, 144 
minimum perceptible, differences of, 
I5D 152, 158-164 
ranges in music, 87-98 
ranges in speech, 5^""^3 j 76^80 
Frequency of occurrence 
of speech sounds, 81-84 
of syllables, 81-83 
of words, 81-83 
Functions 

of hearing organs, 118-125 
of speech organs, 4, 7-10 

H 

Hammer (ossicle), 112, 113, 118, 121 
Elarmonics 

effect on pitch and quality of musical 
sounds, 245-254 


326 


SUBJECT INDEX 


Harmonics 

production by ear overload, 312, 

313 

subjective, 175-180 

Harmonic analyzer, electrical, 51-55, 88- 
90 

Harmonic theory of vowel production, 

47-51 

Hearing 

binaural beats, 186-196 
description of organs, 111-117 
distances for hearing average speech, 
187 

function of organs, 118-125 
an presence of noise, 185-187 
interference, 167-187 
limits of, 132-143 
location of sound images, 188-196 
loss, see Hearing loss 
loss scale, 1 99-201 
loudness, 225-245 
masking, 167-187 
methods of testing acuity, 198-221 
nerve conduction, 1 25-1 31 
positions on basilar membrane for 
sensing pitches, 164-166 
relation between various tests for, 202- 
221 

scale, 1 99-201 

sensitivity for intensity differences, 
145-151, 158-164 ^ 
sensitivity for pitch differences, 151, 
152, 158-164 
sensitivity, 132-142 
subjective tones, 175-180, 250-254, 312, 

313 

theories of, 118-131 
vibration forms of basilar membrane, 
180-185 
Hearing loss 

percent, definition of, 199 
relation between various tests for, 201- j 
221, 314-316 
scale, 1 99-201 

Helicotrema, 116, 117, 118, 121 
Helmholtz resonator, use in recording 
speech waves, 14 


dJ 


. fCK i 


I :iiJ 

Images, sound, effect of intensity iCiici 
phase on location of, 188-196 
Inharmonic theory of vowel production, 

47-51 

Initial speech intensity, definition, 70 u 
fnstantaneous speech power, 64-65 
Intelligibility (see also Articulation) 
definition of, 256 
methods of measuring, 264-266 
relation to articulation, 266 
Intensity 

effect on location of sound images, 188-- 
192 

initial speech, 70 
level, 153-157 

level for audibility, 13 2-1 42 
level for feeling, 132-143 
minimum perceptible differences, I4f 
151,158-164 ) 

of music, 95-98 yf 

of speech, see Speech power 
of speech sounds, effect on articulation 
270-278 

relation to loudness, 230, 232 
sensitivity of normal ear, see Sen 
tivity 
Interference 

effect of noise on speech and music, 9' 
loi, 102, 185-187 

masking, 167-187 (see also Masking^ 
International Phonetic Association, 
alphabet of, 5 

Interpretation, see Articulation and In- 
telligibility 


L 

Language, see Speech sounds 
Larynx, 4, 5 
artificial, 12, 13 
Letter articulation, 255 
relation to syllable articulation, 2^ 
268 

see Articulation 
Level 

frequency, 1 53-1 57 
intensity, 1 53-1 57 
phonic, 153-157 



INDEX 


SUBJECT 


limits of audition 
lower pitch limit, 14I, 143, 144 
methods of measuring, 13:2-142 
threshold of audition, 132-142 
threshold of feeling, 141-143 
■ ’pper pitch limit, 141, 143, 144 
e noise, loi 

< ips, use in sound production, 4, 7, 8 
jcation, binaural, of complex sounds, 
192-196 

of sound images, 188-196 
udness 

comparison of binaural and monaural, 

237, 2138 

effect on, of eliminating various fre- 
quencies, 233-237 
equal, curves for, 230 
losses, computation of, 239-241, 317 
“pf complex sounds, 231, 232 
of pure tones, 226-231 
.elation to intensity, 230, 232 
relation to sensation level, 230, 232 
^ standard, 225, 226 
theory of, 225-244 
mgs, use in sound production, 4 

M 

lanometric capsule (Koenig), use in re- 
cording speech waves, 26, 17 
„.isking 

binaural, 172, 173 

effects of complex sounds, 185, 186 
effect of one pure tone upon another, 
167-280 

monaural, 167-172 
subjective tones, 175-280 
,ian speech power, 64-66 
.atus, auditory, 111-113 
' •^chanism of hearing 
description of organs, 111-117 
. ‘ ‘iinction of organs, 2 2 8-2 25 
4ierve conduction, 125-132 
’positions on basilar membrane for 
sensing pitches, 264-166 
heories of hearing, 118-131 
ibration form of basilar membrane, 
280-185 


3^7 

Mechanism of speaking, see Speech 
sounds and Organs of speech 
Membrane 

basilar, see Basilar membrane 
Reissner’s, 224 
tectorial, 115, 116 
tympanic, 112, 213, 218 
Minimum audible pressure, 132-242 
Minimum perceptible differences in fre- 
quency, 152-252, 158-164 
Minimum perceptible differences in in- 
tensity, 245-251, 158-164 
Monaural 

location of complex sounds, 193 
loudness, comparison with binaural 
loudness, 237, 238 
masking, 167-172 

Motion-picture film use in recording 
speech waves, 24-26 
Musical instruments, see Musical sounds 
Music, effect of noise on, 99, loi, 102 
Musical sounds 

acoustic spectra of, 90-94 
effect of frequency components on 
pitch and quality of, 245-254 
effect of harmonics on, 245-254 
frequency ranges of, 95-98 
intensity of, 95-98 
physical properties of, 87-98 

N 

National Education Association 
alphabet of, 5 
Nerve conduction, 225-r23i 
Noise 

audiograms, 104, 106, 186, 299 
definition, 99 

effect on articulation, 297-301 
effect on speech threshold, 73, 185-187 
interference effect of, 99, 201, 102, 185- 
187 

masking effects, 185-187 

method of measuring, 102-205 

results of surveys, 106, 207 

room noise, 1 01 -106 

telephone line noise, loi 

typical wave form of street noise, ioq 





XA X-/X-/-iX 


0 

Objective tones, 194 
effect on pitch, 250-254 
Organ of Corti, 115 
Organs of hearing 
description, 111-117 
functions, 118-125 

positions on basilar membrane for 
sensing pitches, 164-166 
vibration form of basilar membrane, 
180-185 

Organs of speech 
description, 4, 5 
diagram, 4 
use, 4, 7-10 
Origin of speech, 3, 4 
Oscillograms of typical speech waves, 29- 
49 

Oscillograph, use in recording speech 
waves, 26-28 

Ossicles, 1 1 2, 1 1 3, 1 1 8, 121 
Oval window, 112, 113, 118, 121 
Overload 

of ear, production of overtones by, 3 1 2, 

313 

of vacuum tube, effect on articulation, 
290-292 
Overtones 

effect on pitch and quality of musical 
sounds, 245-254 

production by ear overload, 312-313 
P 

Pantomime, influence upon word forma- 
tion, 3 

Passages, vocal, use in sound production, 
4 

Peak speech power, 64-66, 70, 71, 74, 

75 

Perception 

of music, see Musical sounds 
of speech, see Speech sounds 
tonal, minimum time for, 1 52, 1 53 
Phase, effect on location of sound images, 
188-196 

Phaser, description, 189 
Phonautograph (Koenig and Scott), use 
in recording speech waves, 16, 17 


Phonetic speech power, 64-66, 70, 71, 74 

^75 

Phonic level, definition of, 1 53-1 57 
Phonodeik (D. C. Miller), use in sound 
recording, 17-19 

Phonograph ^ 

effect of speed of rotation on articulan^ 
don, 291-293 “ 

electrical recording for, 23 
use for recording speech waves, 19-22 
Phonograph audiometer (No. 4-A) 
description, 105, 211-214 <j, i 

use in measuring hearing, 211-214 
use in measuring noise, 105 
Phonoscope (Weiss), use in recording 
speech waves, 15, 16 

Photo-electric cell, use in film recording of 
speech, 26 
Pinna, 112 
Pitch 

criterion for determining, 245 f, 

definition of, 245 x 

differential sensitivity for, 151, 152 
158-164 

effect of frequency components on pitt 
of musical sounds, 245-254 
effect of difference tones on, 250-254 
effect of harmonics on, 245-254 
effect on acoustic spectra of vowels, £ 

. 54 , 55 

limits for hearing, 141, 143, 144 
of male and female speech, 53, 54, 6 
62 

range in musical instruments, 97 
recognition of, 245-250 
. scale, definition of, 1 53-157 
Pitch-range audiometer (Nos. i-A anr 
2-A) 

description, 104, 216-218 
use in measuring hearing, 216-221 
use in measuring noise, 104 
Power r 

in music, 95-98 
in speech, see Speech power 
relation to density, pressure, veloci> 
and displacement in a plane wav 
derivation of equations for, joi. 

311 


4 . 



SUBJECT INDEX 


329 


?ressure 

for feeling threshold, 132-143 
minimum audible, 132-142 
minimum perceptible differences of, 
145-151, 158-164 
of music, 95-98 

of speech sounds, see Speech power 
relation to density, velocity, displace- 
ment and power in a plane wave, 
derivation of equations for, 308- 

bability, relation to articulation, 268, 
269 

Q 

Quality 

effect of harmonics on, 245-254 
of musical sounds, effect of frequency 
components on, 245-250 

R 

^a. ge of frequencies 
ir music, 95-98 
in speech, 5I-63, 76-80 
ee also Frequency 
cognition 
f pitch, ?45-25o 
^ speech, sec Articulation 
ording speech waves 
mdenser transmitter, 27 
elmholtz resonator, 14 
mnometric capsule, 16, 17 
lotion picture, 24-26 
oscillograph (high quality), 26-28 
hondeik, 17-19 
monograph, 19-24 
phonautograph, 16 
phonoscope, 15, 16 

photographing condensations and rare- 
factions, 15 
ords of speech waves 
•equency analyses, 51-55 
ypical oscillograms, 29-49 
ssner's membrane, 114 
onance 

haracteristic resonant frequencies of 
speech, 51-63 

.■nstortion, effect on articulation, 287- 
^^89, 318, 3x9 


Resonance 

theory of hearing, 118-125 * 
Resonator, Helmholtz, use in recording 
speech waves, 14 
Reverberation 

effect on articulation, 294-296 
time, definition, 295 
Room noise, see Noise 
Round window, 112, 113, 118, 121, 122 

S 

Scala 

media, 114 

tympani, 112, 114, 121, 122 
vestibuli, 112, 114, 121 
School children, methods of measuring 
hearing of, 211-214 
Sensation area, auditory, 141, 144 
Sensation level 
definition, 68-70 
of speech sounds, 73 
relation to intensity, 230, 232 
relation to loudness, 230, 232 
Sensation unit, definition, 68, 69 
Sensitivity 

differential intensity sensitivity, 145- 
151, 158-164 

differential pitch sensitivity, 151, 152, 
158-164 

methods of testing, 198-221 
of normal ear, 1 32-142 
see also Hearing • 

Sign language, 3 
Similarity of languages, 4 
Simplified Spelling Board, alphabet of, 5 
Sound articulation, see Articulation 
Sound images, effect of intensity and 
phase on location, of, 188-196 
Sounds 

of music, see Musical sounds 
of speech, see Speech sounds 
Speech 

characteristics, see Characteristics of 
speech 

effect of noise on, 73, 99, loi, 102 
initial speech intensity, 70 
see also Speech sounds 
Spectra, acoustic, see Acoustic spectra 


330 


SUBJECT INDEX 


speech organs, 4, 5, 7-10 
Speech power 
average, 64-71 

frequency distribution, 76-^80 
instantaneous, 64, 65 
mean, 64-66 

peak, 64-66, 70, 71, 74, 75 
phonetic, 64-66, 70, 71, 74, 75 
syllabic, 64-66, 74 
Speech sounds 

articulation of, see Articulation 
artificial production of, 10-12 
classification, 6 
duration of, 56-62 
formation, 5-10 

frequency analyses, 51-63, 76-80 
frequency of occurrence, 81-84 
origin and evolution, 3, 4 
pitch of, 51-55 

power, 64-74 

recognition of, see Articulation 
sensation level, 73 

methods of measuring recognition of, 
255-268 

theories of vowel production, 46-51 
typical speech waves, 29-49 
voiced and unvoiced, 5 
Speech waves 

frequency analyses of, 51-55 
methods of recording, 14-28 
typical oscillograms of, 29-49 
Standards, see Units 

Steady state (harmonic) theory of vowel 
production, 47-51 
Stirrup (ossicle), 112, 113, 118, 121 
Structure 

of hearing organs, see Organs of Hearing 
of speech organs, see Organs of Speech 
Subjective tones, 175-180 
beats, 194-196 
effect on pitch, 250-254 
production by ear overload, 312, 313 
Summation tones, 1 75-1 So 
effect on pitch, 250-254 
production by ear overload, 312, 313 
Surveys of noise; 106, 107 
Syllable articulation, 255 
relation to letter articulation, 266, 268 


]' Syllable. See also Articulation 
Syllabic speech power, 64-66, 74 
Syllables, frequency of occurrence, Hi 

T 

Talking machine, .sec Phonograph 
Tectorial membrane, 115, 116 
Telephone line noise, loi 
Theories of hearing, 1x8-131 
location of sound images, iHH-iob 
loudnCvSS theory, 225 -244 
positions on basilar membrane tor 
sensing pitches, 164-166 
vibration form of basilar membranr, 
180-185 

Theories of speech production, see Specs li 
sounds 

Theories of vowel production 
harmonic (steady state), 46-<ji 
inharmonic (transient), 47-5 x 
Thermophone 
constants, 305-307 

use in measuring threshohl inf«*nsifir%, 

134-137 

Threshold of audibility, 132- J4 2 
effect of noise on, 73, 185-1H7 
Threshold of feeling, 132- -143 
! Tonal perception, minimum tiriif 

152-153 

Tones 

difference, 175-180, 250-254, 3x2, ji | 
objective, 194, 250-254 
, subjective, 175-1H0, 250^254, 312, t 
summation, 175-180, 250 254, 312, Oi 
Tongue, use in sound productitm, 4, 7 j . 
Transient (inharmonic) theory of 
production, 47-51 
Transidonals, 8 
Transmission unit, 68, 69 
Transmitter 
condenser, 27, 305-307 
electrostatic, 27, 305-307 
Tube, eustachian, na, 113 
Tuning fork 

relation to sensation unit, 20H-21 1 
test for hearing acuity, 20H-21 1 
Tympanic membrane, 112, nj, n8 



SUBJECT INDEX 


Units 

of hearing loss, 199-201 
of loudness, 225, 226 
sensation, 68, 69 
transmission, 68, 69 
nvoiced sounds, production, S 


acuum tube, overload of, effect on arti- 
ulation, 290-292 

Velocity, relation to density, pressure, 
displacement and power in a plane 
wave, derivation of equations for, 
308-311 

^ctor Talking Machine, 292 
ocal organs, see Organs of speech 
oiced sounds, production, 5 
'owels 

acoustic spectra of, 50-55 
articulation of, see Articulation 
duration of, 56 
power, 71, 74 


pure, 7 
semi, 9 

sensation level, 73 
theories of production, 46-51 
Vowel triangle 
description, 7 
diagram, 6 


Watch tick 

acoustic spectrum, 208 
relation to sensation units, 206-208, 
217-21 9 

test for hearing acuity, 206-208 
Waves, speech, see Speech waves 
Weber-Feschner law, 145 
Window 

oval, 112, 113, 118, 121 
round, 112, 113, II8, 121, 122 
Words 

formation of, see Speech sounds 
frequency of occurrence, 81-83 


r Ur 


o T Y '{• 


